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HAPLOTYPESOF^THESLG26A2GENE - >^...:^^.r..'^ 

RELATED APPLICATIONS 

This application claims the benefit of U.S. Provisional Application Serial No. 60/213,284 filed 
5 June 22, 2000. 

FIELD OF THE INVENTION 

This invention relates to variation in genes that encode phaimaceutically-important proteins. 
In particular, this invention provides genetic variants of the human solute carrier family 26, member 2 
10 (SLC26A2) gene and methods for identifying which variant(s) of this gene is/are possessed by an 
individual. 

BACKGROUND OF TBDE INVENTION 

Current methods for identifying pharmaceuticals to treat disease often start by identifying, 

15 cloning, and expressing an important target protein related to the disease. A determination of whether 
an agonist or antagonist is needed to produce an effect that may benefit a patieiit with the disease is 
then made. Then, vast numbers of compounds are screened against.the target protein to find new 
potential drugs. The desired outcome of this process is a lead compound that is specific for the target, 
thereby reducing the incidence of the undesiied side effects usually caused by activity at non-intended 

20 targets. The lead compound identified in this screening process then undergoes further in vitro and in 
vivo testing to determine its absorption, disposition, metabolism and toxicological profiles. Typically, 
this testing involves use of cell lines and animal models with limited, if any, genetic diversity. 

What this approach fails to consider, however, is that natural genetic variability exists between 
individuals in any and every population with respect to pfaarmaceutically-important proteins, including 

25 ■ the protein targets of candidate drugs, the enzymes that metabolize these drugs and the proteins whose 
activity is modulated by such drug targets. Subtle aiteration(s) in the primary nucleotide sequence of a 
gene encoding a phaimaceutically-important protein may be manifested as significant variation in 
expression, structure and/or function of the proteixL Such alterations may explain the relatively bi^ 
degree of uncertainty inherent in the treatment of individuals with a drug whose design is based upon a 

30 single r^nresentative example of the target or enzyme(s) involved in metabolizmg the drug. For 

example, it is well-established that some drugs fi^quently have lower efficacy in some individuals than 
others, which means such individuals and their physicians must weigh the possible benefit of a larger 
dosage against a greater risk of side effects. Also, there is significant variation in how well people 
metabolize drugs and other exogenous chemicals, resulting in substantial interindividual variation in 

35 the toxicity and/or efficacy of such exogenous substances (Evans et al., 1999, Science 286:487-491). 

This variaBility m efficacy or toxicity of a drug in genetically-diverse patients makes many drugs 

ineffective or even dangerous in certain groups of the population, leading to the &ilure of such drugs 

1 
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in clinical trials or their early withdrawal fiom the maiket even.though they could be highly^bene&ial > 
for other groups in the population. This problem significantly increases the time and cost of drug 
discovery and development, which is a matter of great public concern. 

It is. well-recognized by pharmaceutical scientists that considering the impact of the g^etic 

5 variability of phaimaceutically-impoitant proteins in the early phases of drug discovery and 

development is likely to reduce the Mure rate of candidate and approved drugs (Maishall A 1997 
isfature Biotech 15: 1249-52; Kleyn PW et al. 1998 Scimce 2S1: 1820-21; Kola 1 1999 Curr Opin 
Biotech 10:589-92; Hill AVS et aL 1999 in Evolution in Health and Disease Steams SS (Ed.) Oxford 
University Press, New Yovk, pp 62-76; Meyer UA. 1999 in Evolution in Health and Disease Steams 

10 SS (Ed.) Oxford University Press, New York, pp 41-49; Kalow W et al. 1999 Clin, Phdrm, Therap, 
66:445-7; Marshall, E 1999 Science 284:406-7; Judson R et al. 2000 Fharmacogenomics 1:1-12; 
Roses AD 2000 Nature 405:857-65). However, in practice this has been difficult to do, in large part 
becaiise of the time and cost required for discovering the amount of genetic variation that exists in the 
population (Chakravarti A 1998 Nature Genet 19:216-7; Wang DG et al 1998 Science 280:1077-82; 

15 Chakravarti A 1999 Nat Genet 21:56-60 (suppl); Stephens JC 1999 Mol Diagnosis 4:309-3 17; Kwok 
PY and Gu S 1999 i\fol. Med Today 5:538-43; Davidson S 2000 Nature Biotech 18: 1 134-5). 

The standard for measuring genetic variation among individuals is the haplotype, which is the 
ordered combination of polymorphisms in the sequence of each form of a gene that exists in the 
population. Because haplotypes represent the variation across each form of a gene, they provide a 

20 more accurate and reliable measurement of genetic variation than individual polymorphisms. For 

example, while specific variations in gene sequences have been associated with a particular phenotype 
such as disease susceptibility (Roses AD supra; Ulbrecht M et al. 2000 Am JRespir Cnt Care Med 
161: 469-74) and drug response (Wolfe CR et al. 2000 5M/ 320:987-90; Dahl BS 1997 Acta Psychiatr 
Scand 96 (Suppl 391): 14-21), in many other cases an individual polymorphism may be found in a 

25 variety of genomic backgrounds, Le., different haplotypes, and therefore shows no definitive coupling 
between the polymorphism and the causative site for the phenotype (Claik AG et al. 1 998 Am J Hum 
Genet 63:595-612; Ulbrecht M et al. 2000 supra; Drysdale et al. 2000 PNAS 97: 10483-10488). Urns, 
there is an unmet need in the pharmaceutical industry for information on what hs^lotypes exist in the 
population for phannaceutically-important genes. Such haplotype information would be useful in 

30 improving the efficiency and output of several steps in the drug discovery and development process, 
including target validation, identifying lead compounds, and early phase clinical trials (Marshall et al., 
supra). 

One phannaceutically-important gene for the treatment of osteochondrodysplasias is the solute 
carrier family 26, member 2 (SLC26A2) gene or its encoded product. The transport of sulfates into 
35 cormective tissue cells, especially chondrocytes, is predominantly dependent upon the transporter 
encoded by the SLC26A2 gene (OMIM entry: 222600). Sul&te transport is an integral fitctor in the 
normal formation and maintenance of cartilage and bone, wherein a steady supply of sulfates is 
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necessary forthesynthesisof the chondroitin sulfate chains.att^ ^^^^^^-..-^t^-u*^^^^. 
proteoglycans. The resulting matrix creates a viscous gel that is largely responsible for the ability of 
cartilage and bone to absorb large compressive loads (Watanabe, et al., 1998. J, Btockenu (Tokyo), 
124:687-93). 

5 Inipairment of sulfate transport across the cell membrane leads to ixisufBcient sul&tion of 

cartilage proteoglycans, thereby diminishing the sulfate content of cartilage and disrupting the process 
of endochondral bone formation (Satoh H, et al. 1998. J. Biol, Chem, 273(20): 12307-15; Sperti-Furga 
et al., 1996, Am. J. Med. Genet, 63: 144-7). A substantial body of evidence exists demonstrating that 
mutations in SLC26A2, in particular, constitute a pleiotropic family of recessively inherited 

10 . osteochondrodysplasias including achondrogenesis type IB, atelosteogenesis type II, and diastrophic 
dysplasia (Rossi A, et al. 1998. Matrix Biol, 17(5):361-9; Satoh H, et al. Supra), These ' 
osteochondrodysplasias exhibit a range of pathological severity and comprise a diverse spectrum of 
clinical presentations. Distinguishing features of these disorders include scoliosis, clubbed feet, cleft 
palate, congenital heart defects and shortened, malformed limbs and digits characteristic of diastrophic 

15 dwarfism (OMIM entry. 222600). 

The solute carrier family 26, member 2 gene is located on chromosome 5q3l-q34 and contains 
2 exoos that encode a 739 amino acid protein. A reference sequence for the SLC26A2 gene is shown 
in Figure l(Genaissance Contig No, 3758668; SEQ ID NO: 1). Reference sequences for the coding 
sequence (GenBank Accession No. NM_0001 12.1) and protein are shown in Figures 2 (SEQ ID NO: 

20 2) and 3 (SEQ DD NO: 3), respectively. 

There is one single nucleotide polymorphism in SLC26A2 which has been reported previously 
in the literature (NCBI SNP ID: rs30832). This polymorphism corresponds to the site named PS4 
herein, consisting of a c>tosine or thymine at nucleotide position 140013 in Figure 1. This variation is 
expressed in the coding sequence at nucleotide position 1721 in Figure 2, giving rise to either a 

25 threonine or isoleucine variant at amino acid position 574 in Figure 3. 

Because of the potential for variation in the SLC26A2 gene to affect the expression and 
function of the encoded protein, it wwld be useful to know whether additional polymorphisms exist in 
the SLC26A2 gene, as well as how such polymorphisms are combined in different copies of the gene. 
Such information could be applied for studying the biological function of SLC26A2 as well as in 

30 identifying drugs targeting this protein for the treatment of disorders related to its abnormal expression 
or function. 



SUMMARY OF THE INVENnON 

Accordingly, the inventors herein have discovered 4 novel polymorphic sites in the SLC26A2 
35 gene. These polymorphic sites (PS) conespond to the following nucleotide positions in Figure 1 : 

136098 (PSl), 136195 (PS2), 139338 (PS3) and 140357 (PS5). The polymorphisms at these sites are 
guanine or flf^ffnina at PSl, adenine or guanine at PS2, thynune or adenine at PS3 and adenine or 

3 



wo 01/98318 



PCT/USO 1/20028 



thymine at PS5. Id addition, the inventors have detexminedithe.identity^otthe aUeles at thp^ as^^ 
well as at the previously identified site at nucleotide position 140013 (PS4), in a human reference 
population of 79 unrelated individuals self-identified as belonging to one of four major population 
groups: African descent, Asian, Caucasian and Hispanic/Latino. From this information, the inventors 

5 deduced a set of hapiotypes and haplctype pairs for PS 1-PS5 in the SLC26A2 gene, which are shown 
below, in Tables 5 and 4, respectively. Each of these SLC26A2 hapiotypes defines a natuially- 
occuning isoform (also referred to herein as an ^sogene") of the SLC26A2 gene that exists in the 
human population. The fiequency with which each haplotype and haplotype pair occurs within the 
total reference population and within each of the four major population groups included in the 

10 reference population was also determined. 

Thus, in one embodiment, the invention provides a method, composition and kit for 
genotyping the SLC26A2 gene in an individual The genotyping method comprises identifying the 
nncleotide pair that is present at one or more polymorphic sites selected from the group consisting of 
PSl, PS2, PS3 and PS5 in both copies of the SLC26A2 gene from the individual. A genotyping 

1 5 composition of the invention comprises an oligonucleotide pnsbe or primer which is designed to 
specifically hybridize to a target region containing, or adjacent to, one of these novel SLC26A2 
polymorphic sites. A genotyping kit of the mvcntion comprises a set of oligonucleotides designed to 
genotype each of these novel SLC26A2 polymorphic sites. In a preferred embodiment, the genotyping 
kit comprises a set of oligonucleotides designed to genotype each of PS1-PS5. The genotyping 

20 method, composition, and kit are useful in determining whether an individual has one of the 
hapiotypes in Table 5 below or has one of the haplotype pairs in Table 4 below. 

The invention also provides a method for haplotyping the SLC26A2 gene in an individual. In 
one embodiment, the haplotyping method comprises determining, for one copy of the SLC26A2 gene, 
the identity of the nucleotide at one or more polymorphic sites selected from the group consisting of 

25 PSI, PS2, PS3 and PS5. In another embodiment, the haplotyping method comprises determining 
whether one copy of the individual's SLC26A2 gene is defined by one of the SLC26A2 hapiotypes 
shown in Table 5, below, or a sub-haplotype thereof. In a preferred embodiment, the haplotyping 
method comprises determining whether both copies of the individual's SLC26A2 gene are defined by 
one of the SLC26A2 haplotype pairs shown in Table 4 below, or a sub-haplotype pair thereof. The 

30 method for establishing the SLC26A2 haplotype or haplotype pair of an individual is useful for 

improving the efficiency and reliability of several steps in the discovery and development of drugs for 
treating diseases associated with SLC26A2 activity, e.g., osteochondrodysplasias. 

For example, the haplotyping method can be used by the pharmaceutical research scientist to 
validate SLC26A2 as a candidate target for treating a specific condition or disease predicted to be 

35 associated with SLC26A2 activity. Determining for a particular population the frequency of one or 
more of the individual SLC26A2 hapiotypes or haplotype pairs described herein will &cilitate a 
decision on whether to pursue SLC26A2 as a target for treating the specific disease of interest In 
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particular, if variable SLC26A2 activity^is associated with . the disease, ,tfiea <meOT . 
haplotypes or haplctype pairs will be found at a higher frequency in disease cohorts than in 
appropriately genetically matched controls. Conversely, if each of the observed SLC26A2 haplotypes 
are of similar frequencies in the disease and control groups, then it may be inferred that variable 

5 SLC26A2 activity has little, if any, involvement with that disease. In dther case, the pharmaceutical 
research scientist can, without a priori knowledge as to the phenotypic effect of any SLC26A2 
haplotype or haplotype pair, apply the information derived from detecting SLC26A2 haplotypes in an 
individual to decide whether modulating SLC26A2 activity would be useful in treating the disease. 

The claimed invention is also useful in screening for compounds targeting SLC26A2 to treat a 

10 specific condition or disease predicted to be associated with SLC26A2 activity. For example, 
detecting which of the SLC26A2 haplotypes or haplotype pairs disclosed herein are present in 
individual members of a popxilation with the specific disease of interest enables the pharmaceutical 
scientist to screen for a compound(s) that displays the highest desired agonist or antagonist activity for 
each of the most frequent SLC26A2 isoforms present in the disease population. Thus, without 

1 5 requiring any a priori knowledge of the phenotypic effect of any particular SLC26A2 haplotype or 
haplotype pair, the claimed haplotyping method provides the scientist with a tool to identify lead 
compounds that are more likely to show efScacy in clinical trials. 

The method for haplotyping the SLC26A2 gene in an individual is also useful in the design of 
clinical trials of candidate drugs for treating a specific condition or disease predicted to be associated 

20 with SLC26A2 activity. For example, instead of randomly assigning patients with the disease of 

interest to the treatment or control group as is typically done now, determining which of the SLX^26A2 
haplotype(s) disclosed herein are present in individual patients enables the pharmaceutical scientist to 
distribute SLC26 A2 haplotypes and/'or haplotype pairs evenly to treatment and control groups, thereby 
reducing the potential for bias in the results that could be introduced by a larger frequency of a 

25 SLC26A2 haplotype or haplotype pair that had a previously unknown association with response to the 
drug being studied in the trial. Thus, by practicing the claimed myention, the scientist can more 
confidently lely on the information learned from the trial, without first determining the phenotypic 
effect of any SLC26 A2 haplotype or haplotype pair. 

In another embodiment, the invention provides a method for identifying an association 

30 between a trait and a SLC26 A2 genotype, haplotype, or haplotype pair for one or more of the novel 
polymorphic sites described herein. The method comprises comparing the frequency of the SLC26A2 
genotype, haplotype, or haplotype pair in a population exhibiting the trait with the frequency of the 
SLC26A2 genotype or haplotype in a reference population A higher frequency of the SLC26A2 
genotype, haplotype, or haplotype pair in the trait population than in the reference population indicates 

35 the trait is associated with the SLC26A2 genotype, haplotype, or haplotype pan:- In preferred 

embodiments, the trait is suscqstibility to a disease, severity of a disease, the staging of a disease or 
response to a drug. In a particularly preferred embodiment, the SLC26A2 haplotype is selected from 
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tiie hflplotypes shown in Table Sj-or a sub-liai>lotype'thereof;^Such method q)plicability in ■ - • 
developing diagnostic tests and therapeutic treatments for osteochondrodysplasias. 

In yet another embodiment, the invention provides an isolated polynucleotide comprising a 
nucleotide sequence which is a polymorphic variant of a reference sequence for the SLC26A2 gene or 
5 a fragment thereof. The reference sequence comprises the contiguous sequences shown in Figure 1 
(S£Q ID NO: I) and the polymorphic variant comprises at least one polymorphism selected from the 
group consisting of adenine at PS 1 , guanine at PS2, adenine at PS3 and thymine at PS5. In a prefened 
embodiment, the polymoiphic variant comprises an additional polymozpbism of thymine at PS4. 
A particularly preferred pol>-morpfaic variant is an isogene of the SLC26A2 gene. A 

10 SLC26A2 isogene of the invention comprises guanine or adenine at PS 1» adenine or guanine at PS2, 
thymine or adenine at PS3, cytosine or thymine at PS4 and adenine or thymine at PS5. The invention 
also provides a collection of SLC26A2 isogenes, referred to herein as a SLC26A2 genome anthology. 

In another embodiment, the invention provides a polynucleotide comprising a polymorphic 
variant of a reference sequence for a SLC26A2 cDNA or a fragment thereof. The reference sequence 

15 comprises SEQ ID N0:2 (Fig.2) and the polymorphic cDNA comprises at least one polymorphism 
selected from the group consisting of adenine at a position corresponding to nucleotide 1046 and 
thymine at a position corresponding to nucleotide 206S. In a preferred embodiment, the polymorphic 
variant comprises an additional polymorphism of thymine at a position corresponding to nucleotide 
1721. A particularly prefened polymorphic cDNA variant comprises the coding sequence of a 

20 SLC26A2 isogene defined by haplotypes 1, 2, 4 and 5. 

Polynucleotides complementary to these SLC26A2 genomic and cDNA variants are also 
provided by the invention. It is believed that polymorphic variants of the SLC26A2 gene will be 
useful in studying the expression and function of SLC26A2, and in expressing SLC26A2 protein for 
use in screening for candidate drugs to treat diseases related to SLC26A2 activity. 

25 In other embodiments, the invention provides a recombinant expression vector comprising one 

of the polymorphic genomic variants operably linked to expression regulatory elements as well as a 
recombmant host cell transformed or transfected with the expression vector. The recombinant vector 
and host cell may be used to eT^ress SLC26A2 for protein structure analysis and drug binding studies. 
In yet another embodiment, the invention provides a polypeptide comprising a polymorphic 

30 variant ofa reference amino add sequence for the SLC26A2protem. The reference amino acid 

sequence comprises SEQ ID N0:3 (Fig.3) and the polymorphic variant comprises at least one variant 
amino acid selected from the group consisting of tyrosine at a position corresponding to amino acid 
position 349 and serine at a position corresponding to amino acid position 689. In some embodiments, 
the polymorphic variant also comprises isoleucine at a position corresponding to amino acid position 

35 574. A polymorphic variant of SLC26A2 is useful in studying the effect of the variation on the 
biological activity of SLC26A2 as well as on the binding afEuiity of candidate drugs targeting 
SLC26A2 for the treatment of osteochondrodysplasias. 
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The present invention also provides antibodies that recognize and bind to the above - 

polymorphic SLC26A2 protein variant Such antibodies can be utilized in a variety of diagnostic and 
prognostic formats and therapeutic methods. 

The present invention also provides nonhuman transgenic animals comprising one of the 
5 SLC26A2 polymoq)hic genomic variants described herein and methods for producing such animals. 
The transgenic animals are iiseful for studying expression of the SLC26A2 isogenes in vivo, for in vivo 
screening and testing of drugs targeted against SLC26A2 protein, and for testing the efficacy of 
therapeutic agents and compounds for osteocbondrodysplasias m a biological system 

The present invention also provides a computer system for storing and displaying 
10 polymorphism data determined for the SLC26A2 gene. The computer system comprises a computer 
processing unit; a display; and a database containing the polymorphism data. The polymorphism data 
includes the polymorphisms, the genotypes and the haplotypes identified for the SLC26A2 gene in a 
reference population. In a preferred embodiment, the computer system is capable of producing a 
display showing SLC26A2 haplotypes organized according to their evolutionary relationships. 

15 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates a reference sequence for the SLC26A2 gene (Genaissance Reference No. 
3758668; contiguous lines; SEQ ID N0:1), with the start and stop positions of each region of coding 
sequence indicated with a bracket ([ or ]} and the numerical position below the sequence and the 

20 polymorphic site(s) and polymorphism(s) identified by Applicants in a reference population indicated 
by the variant nucleotide positioned below the polymorphic site in the sequence. SEQ ID NO:24 is 
. equivalent to Figure 1, with the two alternative allelic variants of each polymorphic site indicated by 
the appropriate nucleotide symbol (R- G or A, Y= T or C, M= A or C, K- G or T, S= G or C, and W= 
A or T; WIPO standard ST25). SEQ ED NO:25 is a modified version of SEQ ID N0:24 that shows 

25 the context sequence of each polymorphic site, PS 1-PS5, in a unifonn format to facilitate electronic 
searching. For each polymorphic site, SEQ ID NO:25 contains a block of 60 bases of the nucleotide 
sequence encompassing the centrally-located polymorphic site at the 30*^ position, followed by 60 
bases of unspecified sequence to represent Hiat each PS is separated by gencHnic sequence whose 
composition is defined elsewhere herein. 

30 Figure 2 illustrates a reference sequence for the SLC26A2 coding sequence (contiguous lines; 

SEQ ID N0:2), with the polymorphic site(s) and polymorphism(s) identified by Applicants in a 
reference population indicated by the variant nucleotide positioned below the polymorphic site in the 
sequence. 

Figure 3 illustrates a reference sequence for the SLC26A2 protein (contiguous lines; SEQ ID 
35 N0:3), with the variant amino acid(s) caused by the polymorphism(s) of Figure 2 positioned below the 
polymorphic site in the sequence. 



7 
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DESOtlPTION OF THE PREFERRED ' ^ • 

The present invention is based on the discovery of novel variants of the SLC26A2 gene. As 
described in more detail below, the inventors herein discovered 5 isogenes of the SLC26A2 gene by 
characterizing the SIX26A2 gene found in genomic DNAs isolated from an Index Repository that 
5' contains immortalized cell lines from one chimpanzee and 93 human individuals. The human 

inidiyiduals included a reference population of 79 unrelated individuals self-identified as belonging to 
one of four major population groups: Caucasian (21 individuals), African descent (20 individuals), 
Asian (20 individuals), or Hispanic/Latino (18 individuals). To the extent possible, the members of 
this reference population were organized into population subgroups by their self-identified 
10 ethnogeographic origin as shown in Table 1 below. 



Table 1. Population Groups in the Index Repository 



Population Group 


Population Subgroup 


No. of Individuals 


t African descent 




20 




Sierra Leone 


1 


Asian 




20 




Burma 


1 




China 


3 




Japan 


6 




Korea 


1 




Philippines 


5 




Vietnam 


4 


Caucasian 




21 




British Isles 


3 




British Isles/Central 


4 




British Isles/Eastern 


1 




Central/Eastern 


1 




Eastern 


3 




CentralMediterranean 


1 




Mediterranean 


2 




Scandinavian 


2 


Hispanic/Latino 




18 




Caribbean 


8 




Caribbean (Spanish Descent) 


2 




Central American (Spanish Descent) 


1 




Mexican American 


4 




South American (Spanish Descent) 


3 



In addition, the Index Repository contains three unrelated indigenous American Indians (one 
15 from each of North, Central and South America), one three-generation Caucasian family (fiom the 
CEPH Utah cohort) and one two-generation African- American family. 

The SLC26A2 isogenes present in the human reference population are defined by haplotypes 
for 5 polymorphic sites in the SLC26A2 gene, 4 of which are believed to be novel. The SLC26A2 
polymorphic sites identified by the inventors are referred to as PS1-PS5 to designate the order in 
20 which they are located in the gene (see Table 3 below), with the novel polymorphic sites referred to as 
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PSl. PS2, PS3 and PS5. Using the genotypes identified in the Index Repository for PS1-PS5 and the 
methodology described in the Examples below, the inventors herein also determined the pair of 
haplotypes for the SLC26A2 gene present in individiial human members of this repository. The 
human genotypes and haplotypes found in the repository for the SLC26A2 gene include those shown 
5 . in Tables 4 and 5, respectively, tlie polymorphism and haplo^e data disclosed herein are useful for 
validating whether SLC26 A2 is a suitable target ibr drugs to treat osteochondrodysplasias, screening 
for such drugs and reducing bias in clinical trials of such drugs. 

In the context of this disclosure, the following terms shall be defined as follows unless 
otherwise indicated: 

10 Allele - A particular form of a genetic locus, distinguished fiom other forms by its particular 

nucleotide sequence. 

Candidate Gene - A gene which is hypothesized to be responsible for a disease, condition, or 
the response to a treatment, or to be correlated with one of these. 

Gene - A segment of DNA that contains all the information for the regulated biosynthesis of 
15 an RNA product, including promote, exons, introns, and other untranslated regions that control 
expression-. 

Genotype - An unphased 5 ' to 3 ' sequence of nucleotide pair(s) foimd at one or more 
polymorphic sites in a locus on a pair of homologous chromosomes in an individual. As used herein, 
genotype includes a fuU-genotype and/'or a sub-genotype as described below. 
20 FuU-genotype - The unphased 5 ' to 3 ' sequence of nucleotide pairs found at all polymorphic 

sites examined herein in a locus on a pair of homologous chromosomes in a single individual. 

Sub-genotype - The unphased 5 ' to 3 ' sequence of nucleotides seen at a subset of the 
polymorphic sites examined herein in a locus on a pair of homologous chromosomes in a single 
individual. 

25 Genotyping - A process for determining a genotype of an individual. 

Haplotype - A 5 ' to 3' sequence of nucleotides found at one or more polymorphic sites in a 
locus on a single chromosome from a single individuaL As used herein, haplotype includes a full- 
h^lotype and/or a sub-haplotype as described below. 

FuU-hapIotype - The 5 ' to 3 ' sequence of nucleotides found at all polymorphic sites 
30 examined herein in a locus on a single chromosome from a siogle individual. 

Sub-haplotype - The 5 ' to 3 ' sequence of nucleotides seen at a subset of the polymorphic 
sites examined herein in a locus on a single chromosome fixnn a single individual. 

Haplotype pair - The two haplotypes found for a locus in a single individuaL 
Haplotyping - A process for determining one or more haplotypes in an individual and 
35 includes use of family pedigrees, molecular techiuques and/or statistical inference. 

Haplotype data - Mormation concerning one or more of the following for a specific gene: a 
listing of the h^lotype pairs in each individual in a population; a listing of the different haplotypes in 
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a population; frequeaocy of leach haplotype in that or other populations, and any known assodations ' 
between one or more haplotypes and a trait. 

Isoform - A particular form of a gene, mRNA, cDNA or the protein encoded thereby, 
distinguished &om other fonns by its particular sequence and/or structure. 

Isogene - One of the isofonns of a gene found in a population. An isogene contains all of the 
polymorphisms present in the particular isoform of the gene. 

Isolated - As applied to a biological molecule such as RNA, DNA, oligonucleotide, or 
protein, isolated means the molecule is substantially fiee of other biological molecules such as nucleic 
acids, proteins, lipids, carbohydrates, or other material such as cellular debris and growth media. 
Generally, the texm **isolated'* is not intended to refer to a complete absence of such material or to 
absence of water, buffers, or salts, unless they are present in amounts that substantially interfere with 
the methods of the present invention. 

Locus - A location on a chromosome or DNA molecule corresponding to a gene or a physical 
or phenotypic feature. 

NaturaDy-occurring - A term used to designate that the object it is applied to, e.g., naturally- 
occurrihg polynucleotide or polypeptide, can be isolated from a source in nature and which has not 
been intentionally modified by man. 

Nucleotide pair - The nucleotides found at a polymorphic site on the two copies of a 
chromosome &oni an individual. 

Phased - As applied to a sequence of nucleotide pairs for two or more polymorphic sites in a 
locus * phased means the combination of nucleotides present at those polymorphic sites on a single 
copy of the locus is known. 

Polymorphic site (PS) - A position within a locus at which at least two alternative sequences 
- are found in a population^ the most frequent of which has a frequency of no more than 99%. 

Polymorphic variant - A gene, mRNA, cDNA, polypeptide or peptide whose nucleotide or 
amino acid sequence varies from a reference sequence due to the presence of a polymorphism in the 
gene. 

Polymorphism - The sequence variation observed in an individual at a polymorphic site. 
Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but 
need not, result in detectable differences in gene expression or protein fimcdon. 

Polymorphism data - Information concerning one or more of the followmg for a specific 
gene: location of polymorphic sites; sequence variation at those sites; frequency of polymoxphisms in 
one or more populations; the different genotypes and/or haplotypes determined for the gene; frequency 
of one or more of these genotypes and/or haplotypes in one or more populations; any known 
association(s) between a trait and a genotype or a haplotype for the gene. 

Polymorphism Database - A collection of polymorphism data arranged in a systematic or 

methodical way and capable of being individually accessed by electronic or other means. 

10 
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Polynucleotide - A imcleic add molecale comprised of single-straoded KNA or DNA or 

comprised of complementary, double-stranded DNA 

Population Group - A group of individuals sharing a common ethnogeographic origin. 
Reference Population - A group of subjects or individuals who are predicted to be 
5 representative ofthe genetic variation found in the general population. Typically, the reference 
population represents the genetic variation in the population at a certainty level of at least 85%, 
preferably at least 90%, more preferably at least 95% and even more preferably at least 99%. 

Single Nucleotide Polymorphism (SNP) - Typically, the specific pair of nucleotides 
observed at a single polymorphic site. In rare cases, three or four nucleotides may be found. 
10 Subject - A human individual whose genotypes or haplotypes or response to treatment or 

disease state are to be determined. 

Treatment - A stimulus administered internally or externally to a subject. 
Unphased - As applied to a sequence of nucleotide pairs for two or more polymorphic sites in 
a locus, unphased means the combination of nucleotides present at those polymorphic sites on a single 
1 5 copy of the locus is not known. 

As discussed above, information on the identity of genotypes and'haplotypes for the SLC26A2 
gene of any particular individual as well as the frequency of such genotypes and haplotypes in any 
particular population of individuals is expected to be useful for a variety of drug discovery and 
development applications. Thus, the invention also provides compositions and methods for detecting 
20 the novel SLC26A2 polymorphisms and haplotypes identi£ed herein. 

The compositions comprise at least one SLC26A2 genotyping oligonucleotide. In one 
embodiment, a SLC26A2 genotyping oligonucleotide is a probe or primer capable of hybridizing to a 
target region that is located close to, or that contains, one of the novel polymorphic sites described 
. herein. As used herein, the term "ohgonxicleotide" refers to a polynucleotide molecule having less 
25 than about 100 nucleotides. A preferred oligonucleotide of the invention is 10 to 35 nucleotides long. 
More prefraably, the oligonucleotide is between 15 and 30, and most preferably, between 20 and 25 
nucleotides in lengtL The exact length ofthe oligonucleotide will depend on many fectois that are 
routinely considered and practiced by the skilled artisan. The oligonucleotide may be comprised of 
any phosphorylation state of ribonucleotides, deoxyribonucleotides, and acyclic nucleotide derivatives, 
30 and other functionally equivalent derivatives. Alternatively, oligonucleotides may have a phosphate- 
free backbone, which maybe comprised of linkages such as carboxymethyl, acetamidate, carbamate, 
polyamide (peptide nucleic acid (PNA)) and the like (Vanna, R. in Molecular Biology and 
Biotechnology, A Comprehensive Desk Reference, Ed. R. Meyers, VCH Publishers, Inc. (1995), 
pages 6 1 7-620). Oligonucleotides of the invention may be prepared by chemical synthesis using any 
35 suitable methodology known in the art, or may be derived from a biological sample, for example, by 
restriction digestioiL The oligonucleotides may be labeled, according to any technique known in the 
art, including use of radiolabels, fluorescent labels, enzymatic labels, proteins, haptens, antibodies. 
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. -sequence tags and the like.- ' - ' * 

Gcnotyping oligonucleotides of the invention must be capable of specifically hybridizing to a 
target region of a SLC26A2 polynucleotide, i.e., a SLC26A2 isogene. As used herein, specific 
hybridization means the oligonucleotide fonns ah anti-parallel double-stranded structure with the 
5 target region under certain hybridizing conditions, while failing to form such a structure when 
incubated with a non-target region or a non-SLC26A2 polynucleotide under the same hybridizing 
conditions. Preferably, the oligonucleotide specifically hybridizes to the target region und^ 
conventional high stringency conditions. The skilled artisan can readily design and test 
oligonucleotide probes and primers suitable for detecting polymorphisms in the SLC26A2 gene using 

10 the polymorphism infonnation provided herein in conjimction with the known sequence information 
for the SLC26A2 gene and routine techniques. 

A nucleic acid molecule such as an oligonucleotide or polynucleotide is said to be a **perfect" 
or "complete" complement of another nucleic acid molecule if every nucleotide of one of the 
molecules is complementary to the nucleotide at the corresponding position of the other molecule. A 

15 nucleic acid molecule is ''substantially complementary" to another molecule if it hybridizes to that 
molecule with suf&cient stability to remain in a duple»x form under conventional low-stringency 
conditions. Conventional hybridization conditions are described, for example, by Sambrook J. et al., 
in Molecular Cloning, A Laboratory Manual, 2"^ Edition, Cold Spring Harbor Press, Cold Spring 
Harbor, NY (1989) and by Haymes, B.D. et aL in Nucleic Acid Hybridization, A Practical Approach, 

20 IRL Press, Washington, D.C. (1985). While perfectly complementary oligonucleotides are preferred 
for detecting polymorphisms, departures firom complete complementarity are contemplated where such 
departures do not prevent the molecule from specifically hybridizing to the target region. For 
example, an oligonucleotide primer may have a non-complementary fragment at its 5' end, with the 
remainder of the primer being complementary to the target region. Alternatively, non-complementary 

25 nucleotides may be interspersed into the oligonucleotide probe or primer as long as the resulting probe 
or primer is still capable of specifically hybridizing to the target regiort 

Preferred genotyping oligonucleotides of the invention are allele-specific oligonucleotides. As 
used herein, the term allele-specific oligonucleotide (ASO) means an oligonucleotide that is able, 
under sufficiently stringent conditions, to hybridize specifically to one aUele of a gene, or other locus, 

30 at a target region containing a polymorphic site while not hybridizing to the corresponding region in 
another allele(s). As understood by the skilled artisan, allele-specificity will depend upon a variety of 
readily optimized stringency conditions, including salt and fonnamide concentrations, as well as 
temperatures for both the hybridization and washing steps. Examples of hybridization and washing 
conditions typically used for ASO probes are found in Kogan et al., "Genetic Prediction of Hemophiha 

35 A" in PCR Protocols, A Guide to Methods and Applications, Academic Press, 1990 and Ruafio et al., 

87 Proc Natl, Acad, ScL USA 6296-6300, 1990. Typically, an ASO will be perfectly complementary 

to one allele while containing a single mismatch for another allele. 

12 



wo 01/98318 



PCT/USOl/20028 



Allele-specific oligonucleotides of the invention include ASO probes and ASO primers. ASO - 
probes which usually provide good discrimination between different alleles are those in which a 
central position of the oligonucleotide probe aligns with the polymorphic site in the target region (e.g., 
approximately the 7*^ or 8* position in a ISmcr, the 8* or 9* position in a 16mer, and the 10* or 1 1* 
5 position in a 20mer). An ASO primer of the invention has a 3 ' terminal nucleotide, or preferably a 3 ' 
penultimate micleotide, that is complementary to only one nucleotide of a particular SNP, thereby . 
acting as a primer for polymerase-mediated extension only if the allele containing that nucleotide is 
present ASO probes and printers hybridizing to either the coding or noncoding strand are 
contemplated by the invention. 
10 ASO probes and primers listed below use the appropriate nucleotide symbol (R= G or A, Y= 

T or C, M= A or C. K= G or T. S= G or C, and W= A or T; WlPO standard ST.25) at the position of 
the polymorphic site to represent the two alternative allelic variants observed at that polymorphic site. 

A preferred ASO probe for detecting SLC26A2 gene polymorphisms comprises a nucleotide 

sequence, listed 5 ' to 3 selected from the group consisting of: 

15 AAGTCCTRTACCCAG (SEQ ID NO: 4) and its complement, 

TTAAGGARAAGGGAC (SEQ ID NO: 5) and -its complement, 

TCTCATTWTGGAAAA (SEQ ID NO: 6) and its complement, and 

CAATCCCWCTGTGAG (SEQ ID NO: 7) and its coit^lement. 

20 A preferred ASO primer for detecting SLC26A2 gene polymorphisms comprises a nucleotide 

sequence, listed 5 ' to 3 selected from the group consisting of: 

CTTGGGAAGTCCTRT {SEQ ID NO: 8); AACTGGCTGGGTAYA (SEQ ID N0:9); 
GCTCAATTAAGGARA {SEQ ID NO: 10); TCTTATGTCCCTTYT (SEQ ID N0:11); 
TTAGCCTCTCATTWT (SEQ ID NO: 12); ATGTAGTTTTCCAWA (SEQ ID NO: 13); 
25 TCAGTGCAATCCCWC (SEQ ID NO: 14); and GAATCCCTCACAGWG (SEQ ID NO: 15). 

Other genotyping oligonucleotides of the invention hybridize to a target region located one to 
several nucleotides downstream of one of the novel polymorphic sites identified herein. Such 
oligonucleotides are useful in polymerase-mediated primer extension methods for detecting one of the 

30 novel polymorphisms described herein and therefore such genotyping oligonucleotides are referred to 
herein as "primer-extension oligonucleotides". In a preferred embodiment, the 3 '-terminus of a 
primer-extension oligonucleotide is a deoxynucleotide complementary to the nucleotide located 
immediately adjacent to the polymorphic site. 

A particularly preferred oligonucleotide primer for detecting SLC26A2 gene polymorphisms 

35 by primer extension terminates in a nucleotide sequence, listed 5 ' to 3 selected fiom the group 
consisting of: 

GGGAAGTCCT (SEQ ID NO: 15); TGGCTGGGTA (SEQ ID NO: 17); 
CAATTAAGGA (SEQ ID NO: 18); TATGTCCCTT (SEQ ID NO: 19); 
GCCTCTCATT (SEQ ID NO:20); TAGTTTTCCA (SEQ ID N0:21); 
40 GTGCAATCCC (SEQ ID NO:22); and TCCCTCACAG (SEQ ID NO:23) . 

In some embodiments, a cQn|K>sition contains two or more differently labeled genotyping 

13 
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oligonucleotides for simoltaneously probing the identity of tiucleatides at two or more polymbipliic"~ ' ' 
sites. It is also contemplated that primer compositions may contain two or more sets of allele-specific 
primer pairs to allow simultaneous targeting and amplification of two or more regions containing a 
polymorphic site. 

5 SLC26A2 genotyping oligonucleotides of the invention may also be immobilized on or 

synthesized on a solid sur&ce such as a microchip, bead, or glass slide (see, e.g., WO 98/20020 and 
WO 98/20019). Such rmmobilized genotyping oligonucleotides may be used in a variety of 
polymorphism detection assays, including but not limited to probe hybridization and polymerase 
extension assays. Immobilized SLC26A2 genotyping oligonucleotides of the invention may comprise 

10 an ordered array of oligonucleotides designed to rapidly screen a DNA sample for polymorphisms in 
multiple genes at the same time. 

In another embodiment, the invention provides a kit comprising at least two genotyping 
oligonucleotides packaged in separate containers. The kit may also contain other components such as • 
hybridization buffer (where the oligonucleotidies are to be used as a probe) packaged in a separate 

15 container, .\ltematively, where the oligonucleotides are to be used to amplify a target region, the kit 
may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer 
extension mediated by the polymerase, such as PCR. 

The above described oligonucleotide compositions and kits are useful in methods for 
genotyping and/or haplotyping the SLC26 A2 gene in an individual. As used herein, the tenns 

20 "SLC26A2 genotype" and "SLC26A2 haplotype" mean the genotype or haplotype contains the 
nucleotide pair or nucleotide, respectively, that is present at one or more of the novel polymorphic 
sites described herein and may optionally also include the nucleotide pair or nucleotide present at one 
or more additional polymorphic sites in the SLC26A2 gene. The additional polymorphic sites may be 
currently known polymorphic sites or sites that are subsequently discovered. 

25 One embodiment of the genotyping method involves isolating from the individual a nucleic 

acid sample comprising the two copies of the SLC26A2 gene, or a fragment thereof, that are present in 
the individual, and determinmg the identity of the nucleotide pair at one or more polymorphic sites 
selected from the group consisting of PSl, PS2, PS3 and PS5 In the two copies to assign a SLC26A2 
genotype to the individual. As will be readily understood by the skilled axtisan, the two '^copies'* of a 

30 gene in an mdividual may be the same allele or may be different alleles. In a preferred embodiment of 
the genotypmg method, the identity of the nucleotide pair at PS4 is also determined. In a paiticularly 
preferred embodiment, the genotypuig method comprises determining the identity of the nucleotide . 
pair at each of PS1-PS5. 

Typically, the nucleic acid sample is isolated from a biological sample taken from the 

35 individual, such as a blood sample or tissue sample. Suitable tissue samples iaclude whole blood, 

semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. The nucleic acid sample may 

be comprised of genon:iic DNA, mRNA, or cDNA and, in the latter two cases, the biological sample 

14 
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must be obtained from a tissue in wliich the SLC26A2 gene is expressed Furthermore it will be ^ 
understood by the skilled artisan that mRNA or cDNA preparations would not be used to detect 
polymorphisms located in introns or in 5 ' and 3 ' untranslated regions. If a SLC26A2 gene fragment is 
isolated, it must contain the polymorphic site(s) to be genotyped. 

5 One embodiment of the haplotyping method comprises isolating from the individual a nucleic 

acid sample containing only one of the two copies of the SLC26A2 gene, or a fragment thereof, that is 
present in the individual and determining in that copy the identity of the nucleotide at one or more 
polymorphic sites selected from the group consisting of PSl, PS2, PS3 and PS5 in that copy to assign 
a SLC26A2 haplotype to the individual. The nucleic acid may be isolated using any method capable 

10 of separating the two copies of the SLC26A2 gene or fragment such as one of the methods described 
above for preparing SLC26A2 isogenes, with targeted in vivo cloning being the preferred approach. 
As will be readily appreciated by those skilled in the art, any individual clone will only provide 
haplotype information on one of the two SLC26A2 gene copies present in an individual. If haplotype 
information is desired for the individual's other copy, additional SLC26A2 clones will need to be 

15 examined Typically, at least five clones should be examined to have more than a 90% probability of 
haplotyping both copies of the SLC26A2 gene in an individual. In some embodiments, the 
haplotyping method also comprises identifying the nucleotide at PS4. In a particularly preferred 
embodiment, the nucleotide at each of PS1-PS5 is identified 

In another embodiment, the haplotyping method comprises determining whether an individual 

20 has one or more of the SLC26A2 haplotypes shown in Table 5. This can be accomplished by 
identifying, for one or both copies of the individual's SLC26A2 gene, the phased sequence of 
nucleotides present at each of PS 1-PS5. The present invention also contemplates that typically only a 
• subset of PS1-PS5 will need to be directly examined to assign to an individual one or more of the 
haplotypes shown in Table 5. This is because at least one polymorphic site in a gene is firequently in 

25 strong linkage disequilibrium with one or more other polymorphic sites in that gene (Drysdale, CM et 
al. 2000 PNAS 97:10483-10488; Rieder MJ et al. 1999 Nature Genetics 22:59-62). Two sites are said 
to be in linkage disequilibnum if the presence of a particular variant at one site enhances the 
predictability of another variant at the second site (Stephens, JC 1999, MoL Diag. 4:309-317). 
Techniques for determining whether any two polymorphic sites are in linkage disequilibrium are well- 

30 known in the art (Weir B.S. 1996 Genetic Data Analysis 77, Sinauer Associates, Inc. Publishers, 
Sunderland, MA). 

In a preferred embodiment, a SLC26A2 haplotype pair is determined for an individual by 
identifying the phased sequence of nucleotides at one or more polymorphic sites selected firom the 
group consisting of PSl, PS2, PS3 and PS5 in each copy of the SLC26A2 gene that is present in the 
35 individual. In a particularly preferred embodiment, the haplotyping method comprises identifying the 
phased sequence of nucleotides at each of PS1-PS5 in each copy of the SLC26A2 gene. When 
haplotyping both copies of the gene, the identifying step is preferably performed with each copy of the 

15 



wo 01/98318 



PCT/USOl/20028 



~ gene being placed in separie containers. However, it is also envisioned that if the two cojrifes arc 
l^led with different tags, or are otherwise separately distinguishable or identifiable, it could be 
possible in some cases to perform the method in the same container. For example, if first and second 
copies of the gene are labeled with diffexent first and second fluorescent dyes, respectively, and an 
5 allele-specific oligonucleotide labeled with yet a third different fluorescent dye is used to assay the 
polymorphic site(s), then detecting a combination of the first and third dyes would identify the 
polymorphism in the first gene copy while detecting a combinatioQ of the second and third dyes would 
identify the polymorphism in the second gene copy. 

In both the genotyping and haplotyping methods, the identity of a nucleotide (or nucleotide 

10 pair)' at a polymorphic site(s) may be determined by amplifying a target region(s) containing the 

polymorphic site(s) directly from one or both copies of the SLC26 A2 gene, or a fragment thereof, and 
the sequence of the amplified region(s) determined by conventional methods. It will be readily 
appreciated by the skilled artist that only one nucleotide will be detected at a polymorphic site in 
individuals who are homozygous at that site, while two different nucleotides will be detected if the 

1 5 individual is heterozygous for that site. The polymorphism may be identified directly, known as 

positive-type identification, or by inference, referred to as negative^type identification. For example, 
where a SKP is known to be guanine and cytosine in a reference population, a site may be positively 
determined to be either guanine or cytosine for an individual homozygous at that site, or both guanine 
and cytosine, if the individual is heterozygous at that site. Alternatively, the site may be negatively 

20 determined to be not guanine (and thus cytosine/cytosine) or not cytosine (and thus guanine/guanine). 
The target region(s) may be amplified using any oligonucleotide-directed amplification 
method, including but not limited to polymerase chain reaction (PGR) (U.S. Patent No. 4,965, 188), 
ligase chain reaction (LCR) (Barany et al., Proc. Natl Acad. ScL USA 88:189-193, 1991; 
WO90/01069), and oligonucleotide ligation assay (OLA) (Landegren et al.. Science 241 : 1077-1080, 

25. 1988). 

Other Iknown nucleic acid amplification procedures may be used to amplify the target region 
including transcription-based amplification systems (U.S. Patent No. 5,130,238; EP 329,822; U.S. 
Patent No. 5,169,766, WO89/06700) and isothennal methods (Walker et al.. Proa Nad, Acad, ScL 
C/iS^ 89:392-396. 1992). 

30 A polymorphism in the target region may also be assayed before or after amplification using 

one of several hybridization-based methods known in the art. Typically, allele-specific 
oligonucleotides are utilized in performing such methods. The allele-specific oligonucleotides may be 
used as differently labeled probe pairs, with one member of the pair showing a perfect match to one 
variant of a target sequence and the other member showing a perfect match to a different variant. In 

35 some embodiments, more than one polymorphic site may be detected at once using a set of allele- 
specific oligonucleotides or oligonucleotide pairs. Preferably, the members of the set have melting 
temperatures within 5*^0, and more preferably within 2°C, of each other when hybridizing to each of 
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the polymoipliic sites being detected 

Hybridization of an allele-specific oligonucleotide to a target polynucleotide may be 
performed with both entities in solution, or such hybridization may be performed when either the 
oligonucleotide or the target polynucleotide is covalentiy or noncovalently affixed to a solid support. 

5 Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin 
or avidin-biodxi, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking baking, 
etc. AUele-specific oligonucleotides may be synthesized directly on the solid support or attached to 
the solid support subsequent to synthesis. Solid-supports suitable fisr use in d^ection methods of tBe 
invention include substrates made of silicon, glass, plastic, paper and the like, which may be formed, 

10 for example, into wells (as in 96-well plates), slides, she^, membranes, fibers, chips, dishes, and 
beads. The solid support may be treated, coated or derivatized to facilitate the immobilization of the 
aUele-specific oligonucleotide or target nucleic acid. 

The genotype or haplotype for the SLC26A2 gene of an individual may also be determined by 
hybridization of a liucleic acid sample containing one or both copies of the gene, or fragment(s) 

15 thereof, to nucleic acid arrays and subarrays such as described in WO 95/1 1995, The arrays would 
contain a battery of aUele-specific oligonucleotides representing each of the polymorphic sites to be 
included in the genotype or haplotype. 

The identity of polymorphisms may also be determined using a mismatch detection technique, 
includuig but not limited to the RNase protection method using riboprobes (Winter et al., Proc. NatL 

20 Acad ScL USA 82:7575, 1985; Meyers et al.. Science 230:1242, 1985) and proteins which recognize 
nucleotide mismatches, such as the E. coli mutS protein (Modrich, P. Ann. Rev. Genet. 25:229-253, 
1991). Alternatively, variant alleles can be identified by single strand conformation polymorphism 
(SSCP) analysis (Oritaet al, Genomics 5:874-879, 1989; Humphries et al., in Molecular Diagnosis of 
Genetic Diseases, R. Elles, ed., pp. 321-340, 1996) or denaturing gradient gel electrophoresis (DGGE) 

25 (Wartell et al., NucL Acids Res. 18:2699-2706, 1990; Sheffield et al., Proc. NatL Acad. ScL USA 
86:232-236, 1989). 

A polymerase-mediated primer extension method may also be used to identify the 
poiymoiphism(s). Several such methods have be^ described in the patent and scientific literature and 
include the "Genetic Bit Analysis" method (W092/15712) and the ligase^lymerase mediated genetic 

30 bit analysis (U.S. Patent 5.679,524. Related methods are disclosed m WO91/02087, WO90/09455, 
W095/17676, U.S. Patent Nos. 5,302,509, and 5,945,283. Extended primers containing a 
polymorphism may be detected by mass spectrometry as described in U.S. Patent No. 5,605,798. 
Another primer extension method is allele-specific PGR (Ruano et al., NucL Acids Res. 17:8392, 1989; 
Ruano et al, NucL Acids Res. 19, 6877-6882, 1991; WO 93/22456; Turki et al., J. Clin. Invest 

35 95:1635-1641, 1995). In addition, multiple polymorphic sites may be investigated by simultaneously 
amplifying multiple regions of the nucleic acid using sets of allele-specific primers as described in 
Wallace etal. (WO89/10414). 
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In addition, the identity of the allele(s) present at any of the novel polymoiphic sites described 
herein may be indirectly determined by genotyping another polymorphic site that is in linkage 
disequilibriimi with the polymorphic site that is of interest. Polymorphic sites in linkage 
disequiUbriim with the presently disclosed polymorphic sites may be located in regions of the gene or 
5 in other genomic regions not examined herein, Genotyping of a polymorphic site in linkage 

disequilibrium with the novel polymorphic sites described herein may be performed by, but is not 
limited to, any of the above-mentioned methods for detecting the identity of the allele at a 
polymorphic site. 

In another aspect of the invention, an individual' s SLG26A2 haplotype pair is predicted fcom 

10 its SLC26A2 genotype using information on haplotype pairs known to exist in a reference population. 
In its broadest embodiment, the haplotyping prediction method comjprises identifying a SLC26A2 
genotype for the individual at two or more SLC26A2 polymorphic sites described herein, enumerating - 
all possible iaplotype pairs which are consistent with the genotype, accessing data containing 
SLC26A2 haplotype pairs identified in a reference population, and assigning a haplotype pair to the 

1 5 individual that is consistent with the data. In one embodiment, the reference haplotype pairs include 
the SLC26A2 haplotype pairs shown in Table 4. 

Generally, the reference population should be composed of randomly-selected individuals 
representing the major ethnogeographic groups of the world, A preferred reference population for use 
in the methods of the present invention comprises an approximately equal number of individuals from 

20 Caucasian, African-descent, Asian and Hispanic-Latino population groups with the minimum number 
of each group being chosen based on how rare a haplotype one wants to be guaranteed to see. For . 
example, if one wants to have a q% chance of not missing a haplotype that exists m the population at a 
p% frequency of occurring in the reference populatipn, the number of individuals (n) who must be 
sampled is given by 2n=log( 1 -q)/log(I -p) where p and q are expressed as fractions. A preferred 

25 reference population allows the detection of any haplotype whose frequency is at least 1 0% with about 
99% certainty and comprises about 20 unrelated individuals from each of the four population groups 
named above. A particularly preferred reference population, includes a 3-generation family 
representing one or more of the four population groups to serve as controls for checking quality of 
haplotyping procedures. 

30 In a preferred embodiment; the haplotype frequency data for each ethnogeographic group is 

examined to deteraaine whether it is consistent witti Hardy-Weinberg equilibrium. Hardy-Weinberg 
equilibrium (D.L. Hartl ct al.. Principles of Population Genomics, Sinauer Associates (Sunderiand, 
MA), 3"^ Ed., 1997) postulates that the frequency of finding the haplotype pair HjH^is equal to 

35 A statistically significant difference between the observed and expected haplotype frequencies could 
be due to one or more factors induing significant inbreeding in the population group, strong selective 

18 
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pressure on the gene, sampling bias, and/or errors in the genotyping process. If large deviations-fiom 
Hardy- Weinberg equilibrium are observed in an ethnogeographic group, the number of individuals in 
that group can be increased to see if the deviation is due to a sampling bias. If a larger sample size 
does not reduce the difference between observed and expected haplotype pair frequencies, then one 

5 may wish to consider haplotyping the individual using a direct haplotyping method such as, for 
example, CLASPER System™ technology (U.S. Patent No. 5,866,404), single molecule dilution, or 
allele-specific long-range PGR (Michalotos-Beloin et aL,iVuc/e/c Acidly 24:4841-4843, 1996). 

In one embodiment of this method for predicting a SLC26A2 haplotype pair for an individual, 
the assigning step involves performing the following analysis. First, each of the possible haplotype 

10 pairs is compared to the haplotype pairs m the reference population. Generally, only one of the 

haplotype pairs in the reference population matches a possible haplotype pair and that pair is assigned 
to the individual. Occasionally, only one.haplotype represented in.the.refcrence h^iotype pairs is 
consistent with a possible haplotype pair for an individual, and in such cases the individual is assigned 
a haplotype pair containing this known haplotype and a new haplotype derived by subtracting the 

15 known haplotype from the possible haplotype pair. Alternatively, the haplotype pair in an individixal 
may be predicted from the individual's genotype for that gene using reported ruethods (e.g., Clark ct 
al. 1990 Mol Bio Evol 7: 1 1 1-22) or through a commercial haplotyping service such as offered by 
Genaissance Pharmaceuticals, Inc. (New Haven, CT). In rare cases, either no haplotypes in the 
reference population are consistent with the possible haplotype pairs, or alternatively, multiple 

20 reference haplotype pairs are consistent with the possible haplotype pairs. In such cases, the 

individual is preferably haplotyped using a direct molecular haplotyping method such as, for example, 
CLASiPER System™ technology (U.S. Patent No. 5,866,404), SMD, or allele-specific long-range HCR 
(Michalotos-Beloin et al., supra). A preferred process for predicting SLC26A2 haplotype pairs from 
SLC26A2 genotypes is described in U.S. Provisional Application Serial No. 60/198,340 and the 

25 correspondmg International Application, PCT/USOl/1283 1 . 

The invention also provides a method for determming the frequency of a SLC26A2 genotype, 
haplotype, or haplotype pair in a population. The method comprises, for each member of the 
population, determining the genotype or the haplotype pair for the novel SLC26A2 polymorphic sites 
described herein, and calculating the frequency any particular genotype, haplotype, or haplotype pair 

30 is found in the population. The population may be a reference potation, a &mily population, a same 
sex population, a population group, or a trait population (e.g., a group of individuals exhibiting a trait 
of interest such as a medical Qondition or response to a therapeutic treatm^t). 

In another aspect of the invention, frequency data for SLC26A2 genotypes, haplotypes, and/or • 
haplotype pairs are determined in a reference population and used in a method for identifying an 

35 association between a trait and a SLC26A2 genotype, haplotype, or haplotype pair. The trait may be 
any detectable phenotype, including but not limited to susceptibility to a disease or response to a 
treatment The method involves obtaining data on the frequency of the genotype(s), haplotype(s), or 
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h^lotypc paii(s) of interest in a reference population as well as in a population exhibiting tiie trait 
Frequency data for one or both of the reference and trait populations may be obtained by genotyping 
or haplotyping each individual in the populations using one of the methods described above. The 
haplotypes for the trait population may be detennined directly or, altemativeiy, by the predictive 

5 genotype to haplotype approach described above. In another embodiment, the frequency data for the 
refierence and/or trait populations is obtained by accessing previously determined frequency data, 
which may be in written or electronic form. For example, the frequency data may be present in a 
database that is accessible by a computer. Once the frequency data is obtained, the frequencies of the 
genotype(s), haplotype(s), or haplotype pair(s) of interest in the reference and trait populations are 

1 0 compared. In a preferred embodiment, the frequencies of all genotypes, h2q)lotypes, and/or haplotype 
pairs observed in the populations are compared. If a particular SLC26A2 genotype, haplotype, or 
haplotype pair is more frequent in the trait population than in the reference population at a statistically 
significant amount, then the trait is predicted to be associated with that SLC26A2 genotype, haplotype 
or haplotype pair. Preferably, the SLC26A2 genotype, haplotype, or haplotype pair being compared in 

15 the trait and reference populations is selected from the full-genotypes and fuli-haplotypes shown in 
Tables 4 and 5, or from sub-genotypes and sub-haplotypes derived from these genotypes and 
haplotypes. 

In a preferred embodiment of the method, the trait of interest is a clinical response exhibited 
by a patient to some therapeutic treatment, for example, response to a drug targeting SLC26A2 or 

20 response to a therapeutic treatment for a medical condition. As used herein, '*medical condition" 
includes but is not limited to any condition or disease manifested as one or more physical and/or 
psychological symptoms for which treatment is desirable, and includes previously and newly 
identified diseases and other disorders. As used herein the term "clinical response" means any or all 
of the foUowmg: a quantitative measure of the response, no response, and adverse response (i.e., side 

25 effects). 

In order to deduce a correlation between clinical response to a treatment and a SLC26A2 
genotype, haplotype, or haplotype pair, it is necessary to obtain data on the clinical responses 
exhibited by a population of individuals who received the treatment, hereinafter the "clinical 
population". This clinical data may be obtained by analyzing the results of a clinical trial that has 
30 already been run and/or the clinical data may be obtained by designing and carrying out one or more 
new clinical trials. As used herein, the term "clinical trial" means any research study designed to 
collect clinical data on responses to a particular treatment, and includes but is not limited to phase I, 
phase n and phase IH clinical trials. Standard methods are used to define the patient population and to 
enroll subjects. 

35 * It is preferred that the individuals included in the clinical population have been graded for the 

existence of the medical condition of interest. This is important in cases where the symptom(s) being 
presented by the patients can be caused by more than one underlying condition, and where treatment 
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of the imderiyixig conditioiis are not the same. An example of this would be where patients experience 
breathing difficulties that are due to either asthma or respiratory infections. If both sets were treated 
with an asthma medication, there would be a spurious group of apparent non-responders that did not 
actually have asthma. These people would affect the abOity to detect any correlation between 

• 5 haplotype and treatment outcome. This grading of potential patients could employ a standard physical 
exam or one or more lab tests. Alternatively, grading of patients could use haplotyping for situations 
where there is a strong coirelation between haplotype pair and disease susceptibility or severity. 

The therapeutic treatment of interest is administered to each uidividual in the trial population 
and each individual's response to the treatment is measured using one or more piedetexmined criterLa. 

10 It is contemplated that in many cases, the trial population will exhibit a range of responses and that the . 
investigator will choose the numbdr of responder groups (e.g., low, medium, higjh) made up by the 

various responses. In addition, the SLC26A2 gene for each individual in the trial population is 

genotyped and/or hstplotyped, which may be done beifbre or after administering the treatment 

After both the clinical and polymorphism data have been obtained, correlations between 

15 individual response and SLC26A2 genotype or haplotype content are created. Correlations may be 
produced in several ways. In one method, individuals are grouped by their SLC26A2 genotype or 
haplotype (or haplotype pair) (also referred to as a polymorphism group), and then the averages and . 
standard deviations of clinical responses exhibited by the members of each polymorphism group are 
. calculated. 

20 These results are then analyzed to determine if any observed variation in clinical response 

between polymorphism groups is statistically significant Statistical analysis methods which may be 
used are described in L.D. Fisher and G. vanBelle, "Biostatistics: A Methodology for the Health 
Sciences", Wiley-Interscience (New York) 1993. This analysis may also include a regression 
calculation of which polymorphic sites in the SLC26A2 gene give the most significant contribution to 

25 the differences in phenotype. One regression model useful in the invention is described in PCT 
Application Serial No. PCT/USOO/1 7540, entitled "Methods for Obtaining and Usittg H^lotype 
Data". 

A second method for finding correlaticms between SLC26A2 hz^lotype content and clinical 
responses uses predictive models based on error-miniinizmg optimization algorithms. One of many 

30 possible optimization algorithms is a genetic algorithm (R. Judson, "Genetic Algorithms and Their 
Uses in Chemistry** in Reviews in Computational Chemistry, Vol 10, pp. 1-73, K. B. lipkowitz and 
D. B. Boyd, eds. (VCH PubUshers, New Yoric, 1 997). Simulated annealing (Press et al., **Numerical 
Recipes in C: The Art of Scientific Computing^, Cambridge University Press (Cambridge) 1992, Ch. 
10), neural networks (E. Rich and K. Knight, "Artificial Intelligence", 1"^ Edition (McGraw-Hill, New 

35 York, 1991, Ch. 18), standard gradient descent methods (Press et al., ^wpra, Ch. 10), or other global or 
local optimization approaches (see discussion in Judson, supra) could also be used. Preferably, the 
correlation is found using a genetic algorithm approach as described in PCT Application Serial No. 
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Correlations may also be analyzed using analysis of variation (ANOVA) techniques to 
determine how much of the variation in the clinical data is explained by different subsets of the 
polymorphic sites in the SLC26A2 gene. As described in PCT AppUcation Serial No. 
5 PCT/USOO/17540, ANOVA is used to test hypotheses about whether a response variable is caused by 
or correlated with one or more traits or variables that can be measured (Fisher and vanBeile, supra, 
Ch. 10). 

From the analyses described above, a mathematical model may be readily constructed by the 
skiUed artisan that predicts clinical response.as a fonction of SLC26A2 genotype or haplotype content 
10 - Preferably, the model is validated in one or more follow-up clinical trials designed to test the model. 

The identification of an association between a clinical response and a genotype or haplotype 
(or haplotype pair) for the SLG26A2 gene may be die basis.for designing a diagnostic method to 
determine those individuals who will or wiU not respond to the treatment, or alternatively, wiU respond .• 
at a lower level and thus may require more treatment, Le., a greater dose of a dmg. The diagnostic 
1 5 metiiod may take one of several forms: for example, a direct DNA test (i.e., genotyping or haplotyping 
one or more of the polymorphic sites in the SLC26A2 gene), a serological test, or a physical exam 
measurement. The only requirement is that there be a good correlation between the diagnostic test 
results and the underlying SLC26A2 genotype or haplotype that is in turn correlated with the clinical 
response. In a preferred embodiment, this diagnostic method uses thepredictive haplotyping method 
20 described above. 

In another embodiment, the invention provides an isolated polynucleotide comprising a 
polymorphic variant of the SLC26A2 gene or a fragment of the gene which contains at least one of the 
novel polymoiphic sites described herein. Tne nucleotide sequence of a variant SLC26 A2 gene is 
identical to the reference genomic sequence for those portions of the gene examined, as described in 
25 the Examples below, except that it comprises a different nucleotide at one or more of the novel 
polymorphic sites PSl, PS2, PS3 and PS5, and may also comprise an additional polymorphism of 
thymine at PS4. Similarly, the nucleotide sequence of a variant fragment of the SLC26A2 gene is 
identical to the corresponding portion of the reference sequence except for having a different 
nucleotide at one or more of the novel polymoiphic sites described herein. Thus, the mvention 
30 specificaUy does not include polynucleotides comprising a nucleotide sequence identical to the - 
reference sequence of the SLC26A2 gene, which is defined by haplotype 3, (or other reported 
SLC26 A2 sequences) or to portions of the reference sequence (or other reported SLC26A2 
sequences), except for genotyping oligonucleotides as described above! 

The location of a polymorphism in a variant gene or fragment is identified by aligning its 
35 sequence against SEQ ID N0:1. The polymorphism is selected from the group consistmg of adenine 
at PSl, guanine at PS2, adenine at PS3 and thymine at PS5. In a preferred embodiment, the 
polymorphic variant comprises a naturally-occurring isogene of the SLC26A2 gene which is defined 
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byanyoneofhaplotypes l-2and4-5 showainTableSbelow. 

Polymorphic variants of the inventioa may be prepared by isolating a clone containing the 
SLC26A2 gene from a human genomic library. The clone may be sequenced to determine the identity 
of the nucleotides at the novel polymorphic sites described herein. Any particular variant clahned 
herein could be prepared from this clone by performing in vitro mutagenesis using procedures well- 
known in the art. 

SLC26A2 isogenes may be isolated using any method that allows separation of the two 
"copies" of the SLC26A2 gene present m an mdividual, which, as readily understood by the skilied 
artisan, may be the same allele or different alleles. Separation methods mcludc targeted in vivo 
cloning (TWC) m yeast as described in WO 98/01573, U.S. Patent No. 5.866,404, and U.S. Patent No. 
5,972,614. Another method, which is described in U.S. Patent No. 5,972,614, uses an allele specific 
oligonucleotide in combination with primer. extension and exonuclease degradation to generate . 
hemizygous DNA targets. Yet other methods are single molecule dilution (SMD) as described in 
Ruailo et al., Froa Natl Acad. ScL 87:6296-6300. 1990; and aUele specific PGR (Ruano et al., 1989, 
supra; Ruano et al., 1991, supra; Michalatos-Beloin et al., supra). 

The invention also provides SLC26A2 genome anthologies, which are collections of 
SLC26A2 isogenes found in a given population. The population may be any group of at least two 
individuals, including but not limited to a reference population, a population group, a family 
population, a clinical population, and a same sex population. A SLC26A2 genome anthology may 
comprise individual SLC26A2 isogenes stored in separate containers such as microtest tubes, separate 
wells of a microtitre plate and the like. Alternatively, two or more groups of the SLC26A2 isogenes in 
the anthology may be stored in separate "containers. Individual isogenes or groups of isogenes in a 
genome anthology may be stored in any convenient and stable form, including but not limited to in 
buffered solutions, as DNA precipitates, fieeze-dried preparations and the like. A preferred SLC26A2 
genome anthology of the invention comprises a set of isogenes defined by the haplotypes shown in 
Table 5 below. 

An isolated polynucleotide containing a polymoiphic variant nucleotide sequence of the 
invention may be operably linked to one or more ejq)ression regulatory elements in a recombinant 
expression vector capable of being propagated and expressing the encoded SLC26A2 protein in a 
prokaiyotic or a eukaryotic host cell. Examples of expression regulatory elements which may be used 
include, but are not limited to, the lac system, operator and pronaoter regions of phage lambda, yeast 
promoters, and promoters derived from vaccinia virus, adenovirus, retroviruses, or SV40. Other 
regulatory elements include, but are not limited to, appropriate leader sequences, termination codons, 
polyadenylation signals, and other sequences rcqtdred for the appropriate transcription and subsequent 
translation of the nucleic acid sequence in a given host cell. Of course, the correct combinations of 
expression regulatory elements will depend on the host system used. In addition, it is understood that 
the expression vector contains any additional elements necessary for its transfer to and subsequent 
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replication in the host cell. Examples of such elements inchide, but not limited to, origins of 
replication and selectable markers. Such expression vectors are commerciaDy available or are readily . 
constructed using methods known to those in the art (e.g., F. Ausubel et al., 1987, in "Current 
Protocols in Molecular Biology", John Wiley and Sons, New York, New York), Host cells which may 
be used to express the variant SLC26A2 sequences of the invention include, but are not limited to, 
eukaryotic and mammalian cells, such as animal, plant, insect and yeast cells, and prokaryotic cells, 
such as E. coli, or algal cells as known in the art. The recombinant expression vector may be 
introduced into the host cell using any method known to those in the art including, but not limited to, 
microinjection, electioporation, particle bombardment,,transduction, and transfection using DEAE- 
dextran, Upofection, or calcium phosphate (see e.g.,.Sambrook et al. (198?) in '•Molecular Qoning. A 
Laboratory Manual", Cold Spring Harbor Press, Plainview, New York). In a prefened aspect, 
eukaryotic expression vectors that function in eukaryotic. cells, and preferably mammalian cells, are 
used. Non-limiting examples of such vectors include vaccinia virus vectors, adenovirus vectors, 
herpes virus vectors, and baculovirus transfer vectors. Preferred eukaryotic cell lines include COS 
cells, CHO cells, HeLa cells, NIH/3T3 cells, and embryonic stem cells (Thomson, J. A. et al,, 1998 
Science 282: 1 145-1 147). Particularly preferred host cells are mammalian cells. 

As will be readily recognized by the skilled artisan, expression of polymorphic variants of the 
SLC26A2 gene will produce SLC26A2 mRNAs varying &om each otiier at any polymorphic site 
retained in the spliced and processed mRNA molecules. Tliese mRNAs can be used for tiic . 
preparation of a SLC26A2 cDNA comprising a nucleotide sequence which is a polymorphic variant of 
the SLC26A2 reference coding sequence shown in Figure 2. Thus, the invention also provides 
SLC26A2 mRNAs and corresponding cDNAs which comprise a nucleotide sequence that is identical 
to SEQ ID N0:2 (Fig. 2), or its corresponding RNA sequence, except for having one or more 
polymorphisms selected from flie group consisting of adenine at a position corresponding to nucleotide 
1046 and tiiymine'at a position corresponding to nucleotide 2065, and may also comprise an additional 
polymorphism of thymine at a position corresponding to nucleotide 1721. A particularly preferred 
polymorphic cDNA variant comprises the coding sequence of a SLC26A2 isogene defined by 
haplotypes 1, 2, 4 and 5. Fragments of these variant mRNAs and cDNAs are included in the scope of 
the invention, provided they contain the novel polymorphisms described herein. The invention 
specifically excludes polynucleotides identical to previously identified and characterized SLC26A2 
cDNAs and fiagments tiiereof. Polynucleotides apprising a variant RNA or DNA sequence may be 
isolated fix)m a biological sample using well-known molecular biological procedures or may be 
chemically synthesized. 

As used herein, a polymorphic variant of a SLC26A2 gene fragment comprises at least one 
novel polymorphism identified herein and has a length of at least 10 nucleotides and may range up to 
. the full length of tiie gene. Preferably, such fragments are between 100 and 3000 nucleotides in 
lengtii, and more preferably between 200 and 2000 nucleotides in length, and most preferably between 
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500 and 1000 nucleotidesln length. 

In describing the SLC26 A2 polymoiphic sites identified herein, reference is made to the sense 
strand of the gene for convenience. However, as reco^iized by the skilled artisan, nucleic acid 
molecules containing the SLC26A2 gene may be complementary double stranded molecules and thus 

5 reference to a particular site on the sense strand refers as well to the corresponding site on the 

complementary antisense strand. Thus, reference may be made to the same polymorphic site on either 
strand and an ohgonucleotide may be designed to hybridize specifically to either strand at a target 
region containing the polymorphic site. Thus, the invention also includes single-stranded 
polynucleotides which are complementaiy to the sense strand of the SLC26A2 genomic variants 

10 described herein. 

Polynucleotides comprising a polymorphic gqie variant or fiagment may be useful for 
therapeutic purposes; For example, where a patient could benefit from expression, or increased . 
expression, of a particular SLC26A2 protein isoform, an expression vector encoding the isofoim may 
be administered to the patient. The patient may be one who lacks the SLC26A2 isogene encoding that 

15 isoform or may already have at least one copy of that isogene. 

In other situations, it may be desirable to decrease or block expression of a particular 
SLC26A2 isogene. Expression of a SLC26A2 isogene may be mmed off by transforming a targeted 
organ, tissue or cell population with an expression vector that expresses high levels of untranslatable ^ 
mRNA for the isogene. Alternatively, oligonucleotides directed against the regulatory regions (e.g., 

20 promoter, introns, enhancers, 3 ' untranslated region) of the isogene may block transcription. 

Oligonucleotides targeting the transcription initiation site, e.g., between positions -10 and +10 &om 
the start site are preferred. Sunilarly, inhibition of transcription can be achieved using 
oligonucleotides that base-pair with region(s) of the isogene DNA to form triplex DNA (see e.g.. Gee 
. et al. in Huber, B.E. and B.L Carr, Molecular and Immunologic Approaches, Futura Publishing Co., 

25 Mt. Kisco, N.Y., 1 994). Antisense oligonucleotides may also be designed to block translation of 
SLC26A2 mKMA transcribed from a particular isogene. It is also contemplated that ribozymes may 
be designed that can catalyze the specific cleavage of SLC26A2 mRNA transcribed from a particular 
isogene. 

The oligonucleotides may be delivered to a target cell or tissue by expression firom a vector 
30 introduced into the cell or tissue in vivo or ex vivo. Alternatively, the oligonucleotides may be 

fonnulated as a pharmaceutical composition for administration to the patient. Oligoribonucleotides 
and/or oligodeoxynucleotides intended for use as antisense oligonucleotides may be modified to 
increase stability and half-life. Possible modifications include, but are not limited to phosphorothioate 
or 2' 0-methyl Imkages, and the inclusion of nontraditional bases such as inosme and queosine, as 
35 well as acetyl-, methyl-, thio-, and similarly modified forms of adenine, cytosine, guanine, thymine, 
and uracil which are not as easily recognized by endogenous nucleases. 

The invention also provides an isolated polypeptide comprising a polymotphic variant of the 
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refi5renceSIX^6A2a2nmoacM sequence shown in FigiTO 3. The ' 
SLC26A2 polypeptide or firagment of the invention is identified by aligning its sequence against SEQ 
ID N0:3 (Fig. 3). A SLC26A2 protein variant of the invention comprises an amino acid sequence 
identical to SEQ ID N0:3 except for having one or more variant amino acids selected from the group 

5 consisting oftyrosine at a position corresponding to amino acid position 349 and serine at a position 
corresponding to amino acid position 689, and may also comprise an additional variant amino acid of 
isoleucine at a position corresponding to amino acid position 574. The invention specifically excludes 
amino acid sequences identical to those previously identified for SLC26A2, inchiding SEQ ID N0:3, 
and previously described fragments thereof. SLC26A2 protein variants included within the invtation 

10 comprise all amino acid sequences based on SEQ. ID N0:3 and having the combination of amino acid 
variations described in Table 2 below. In preferred cmbodunents, a SLC26A2 protein variant of the 
invention is encoded by an isogene defined by one of the observed haplotypcs shov^ . , . 

Table 2. Novel Polymorphic Variants of SLC26A2 

15 Polymorphic Amino Acid Position and Identities 
Variant 

Number - 349 574 689 

1 F T S 

2 F I S 
20 3 y T T 

4 Y T S ■ 

5 Y I T 

6 Y I *S ■ ■ 

25 The invention also includes SLC26A2 peptide variants, which are any fragments of a 

SLC26A2 protein variant that contain one or more of the amino acid variations shown in Table 2 A 
SLC26A2 peptide variant is at least 6 amino acids in length and is preferably any number between 6 
and 30 amino acids long, more preferably between 10 and 25, and most preferably between 15 and 20 
amino adds long. Such SLC26A2 peptide variants may be useful as antigens to generate antibodies 

30 specific for one of the above SLC26A2 isoforms. In addition, tiie SLC26A2 peptide variants may be 
useful in drug screening assays. 

A SLC26A2 variant protein or peptide of the invention may be prepared by chemical synthesis 
• or by expressing one of the variant SIX;26A2 genomic and cDNA sequences as described above. 
Alternatively, the SLC26 A2 protein variant may be isolated from a biological sample of an individual ' 

35 having a SLC26A2 isogene which encodes the variant protein. Where the sample contains two 
different SLC26A2 isoforais (i.e., the individual has diflferent SLC26A2 isogenes). a particular 
SLC26A2 isofoim of the invention can be isolated by immunoaffinity chromatography using an 
antibody which specificaDy binds to tiiat particular SLC26A2 isoform but does not bind to the otiier 
SLC26A2 isoforuL 

40 The expressed or isolated SLC26 A2 protein may be detected by methods known in the art, 

including Coomassie blue staining, silver staining, and Western blot analysis using antibodies specific 
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fortiieisofonnoftheSIX:26A2protemasdiscussed&^ SLC26A2 variant protdus can be " 

purified by standard protein purification procedures known in the art, including differential 
precipitation, molecular sieve chromatography, ion-exchange chromatography, isoelectric focusing, 
gel electrophoresis, affinity and immunoaffinity chromatography and the like. (Ausubel et. al, 1987, 
la Current Protocols in Molecular Biology John Wiley and Sons, New York, New York). In the case 
of immunoaffinity chromatography, antibodies specific for a particular polymorphic variant may be 
used. 

A polymorphic variant SLC26A2 gene of the invention may also be fused in fiame with a 
heterologous sequence to encode a chimeric SLC26A2 protein. Tlie non-SLC26A2 portion of the 
chimeric protein may be recognized by a commerciaUy available antibody. In addition, the chimeric 
protein may also be engineered to contam a cleavage site located between the SLC26A2 and non- 
SLC26A2 portions so that the SLC26A2 protein may be cleaved and purified away fiom the non- 
SLC26A2 portion. 

An additional embodiment of the invention relates to using a novel SLC26A2 protein isofoim 
in any of a variety of drug screening assays. Such screening assays may be performed to identify 
agents that bind specifically to all known SLC26A2 protein isofonns or to only a subset of one or 
more of these isoforms. The agents may be from chemical compound hbraries, peptide Ubrarics and 
the like. The SLC26A2 protein or peptide variant may be free in solution or affixed to a solid support. 
In one embodiment, high throughput screening of compounds for binding to a SLC26A2 variant may 
be accomplished using the method described in PCT application WO84/03565. in which large 
numbers of test compounds are synthesized on a solid substrate, such as plastic pins or some other 
surfece, contacted wiHi the SLC26A2 protein(s) of interest and then washed. Bound SLC26A2 
protein(s) are then detected using methods well-known in the art. 

In another embodunent. a novel SLC26A2 protein isoform may be used in assays to measure 
the binding affinities of one or more candidate drugs targeting the SLC26A2 protein. 

In yet another embodiment, when a particular SLC26A2 haplotype or group of SLC26A2 
haplotypes encodes a SLC26 A2 protein variant with an amino acid sequence distinct from that of 
SLC26A2 protein isofonns encoded by other SLC26 A2 haplotypes, then detection of that particular 
SLC26A2 haplotype or group of SLC26 A2 haplotypes may be accompUshed by detecting ©qpression 
of the encoded SLC26A2 protein variant using any of the methods described herein or otherwise 
commonly known to the skilled artisan. 

In another embodiment, the invention provides antibodies specific for and inununoreactive 
with one or more of the novel SLC26A2 variant proteins described herein. The antibodies may be 
either monoclonal or polyclonal in origii. Tlie SLC26A2 protein or peptide variant used to generate 
the antibodies may be from natural or recombinant sources or produced by chemical syndiesis using 
• synthesis techniques known in the art. If the SLC26A2 protein variant is of insufficient size to be 
antigenic, it may be conjugated, complexed, or otherwise covalently linked to a carrier molecule to 
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enhance tiieantageniaty of the pq)tide. Examples of carrier molecules, include, but are not limited to, 
albumins (e.g., human, bovine, fish, ovine), and keyhole limpet hemocyanin (Basic and Clinical 
Immunology, 1991, Eds. D.P. Stites, and A.L Terr, Appleton and Lange, Norwalk Connecticut, San 
Mateo, California). 

In one embodiment, an antibody specifically immunoreactive with one of the novel protein 
isofonns described herein is administered to an individual to neutralize activity of the SIX26A2 
isoform expressed by that individual The antibody may be formulated as a pharmaceutical 
composition which mcludes a phannaceutically acceptable carrier. 

Antibodies specific for and immunoreactive with one of the novel protein isofonns described 
herein may be used to immunoprecipitate the SLC26A2 protein variant fom sohition as well as react 
with SLC26 A2 protein isoforms on Western or immunoblots of polyaaylamide gels on membrane 
supports or substrates. In another.preferred embodiment, the antibodies will detect SLC26A2 protein 
isoforms in parafBn or frozen tissue sections, or in cells which have been fixed or unfixed and 
prepared on slides, coverslips, or the like, for use in immunocytochemical, immunohistochemical, and 
immunofluorescence techniques. 

In another embodiment, an antibody specifically immunoreactive with one of the novel 
SLC26A2 protein variants described herein is used in immunoassays to detect this variant in biological 
samples. In this method, an antibody of the present invention is contacted with a biological sample, 
and the formation of a complex between the SLC26A2 protein variant and the antibody is detected. 
As described, suitable immunoassays include radioinmiunoassay, Western blot assay, 
immunofluorescent assay, enzyme linked immunoassay (ELISA), chemiluminescent assay, 
immunohistochemical assay, unmunocytochemical assay, and the like (see, e.g., Principles and 
Practice of Immunoassay, 1991, Eds. Christopher P. Price and David J. Neoman, Stockton Press, New 
York, New York; Current Protocols in Molecular Biology, 1987, Eds. Ausubel et al., John Wiley and 
Sons, New York, New York). Standard techniques known in the art for ELISA are described in 
Methods in Immunodiagnosis, 2nd Ed., Eds. Rose and Bigazzi, John Wiley and Sons, New York 
1980; and Campbell et al., 1984, Methods in Immunology, W.A. Benjamin, Inc.). Such assays may be 
direct, indirect, competitive, or noncompetitive as described in the art (see, e.g., Principles and 
Practice of Inmiunoassay, 1991, Eds. Christopher P. Price and David J. Neoman, Stockton Pres, NY, 
NY; and Oellirich, M., 1984, J. Clm. Chem. CUn. Biochem., 22:895-904). Protems may be isolated 
from test specimens and biological samples by conventional methods, as described in Current 
Protocols in Molecular Biology, supra. 

Exemplary antibody molecules for use in the detection and therapy methods of the present 
invention are intact immunoglobulin molecules, substantially intact immunoglobulin molecules, or 
those portions of immunoglobulin molecules that contain the antigen binding site. Polyclonal or 
monoclonal antibodies may be produced by methods conventionally known in the art (e.g., Kohler and 
Milstein, 1975, Nature, 256:495-497; Campbell Monoclonal Antibody Technology, the Production 
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and Characterization of Rodent and Human Hybridomas, 1985, In: Laboiatoiy Techniques in 

Biochemistry and Molecular Biology, Eds. Burden et al., Vohune 13, Elsevier Science Publishers, 
Amsterdam). The antibodies or antigen binding fragments thereof may also be produced by genetic 
engineering. The technology for expression of both heavy and light chain genes in E. coli is the 
subject of PCT patent appHcations, publication number WO 901443, WO 901443 and WO 9014424 
and in Huse et aL, 1989, Science, 246:1275-1281. The antibodies may also be humanized (e.g.. 
Queen, C. et al. 1989 Proc. Natl. Acad Sci.USA 86;10029). 

' Effect(s) of the polymorphisms identified herdn on expression of SLC26A2 may be 
investigated by preparing recombinant cells and/or nonhuman recombiiiant organisms, preferably 
recombinant animals, contaioing a polymorphic variant of the SLC26A2 gene. As used herein, 
"expression" includes but is not limited to one or more of the following: transcription of the gene into 
precursor mKNA; splicing and other processing of the precuisor niRNA to produce mature mRNA; 
mRNA stability; translation of the mature mRNA into SLC26A2 protein (including codon usage and 
tRNA avdlability); and glycosylation and/or other modifications of the translation product, if required 
for proper expression and function. 

To prepare a recombinant cell of the invention, the desired SLC26A2 isogene may be 
introduced into the cell in a vector such that the isogene remains extrachromosomal. In such a 
situation, the gene will be expressed by the cell from the extrachromosomal location. In a preferred 
embodiment, the SLC26A2 isogene is introduced into a cell in such a way that it recombines with the 
endogenous SLC26A2 gene present in the cell. Such recombination requires the occurrence of a 
double recombination event, thereby resulting in the desired SLC26A2 gene polymorphisnu Vectors 
for the introduction of genes both for recombmation and for extrachromosomal maintenance are 
known in the art, and any suitable vector or vector construct may be used in the invention. Methods 
such as electroporation, particle bombardment, calcium phosphate co-precipitation and viral 
transduction for introducing DNA into cells are known in the art; therefore, the choice of method may 
lie with the competence and preference of the skilled practitioner. Examples of cells mto which the 
SLC26A2 isogene may be introduced include, but are not limited to, continuous culture cells, such as 
COS, NIH/3T3, and primary or culture cells of the relevant tissue type, i.e., they express the SLC26A2 
isogene. Such recombinant ceUs can be used to compare the biological activities of the different 
protein variants. 

Recombiuant nonhuman organisms, i.e^ transgenic animals, expressing a variant SLC26A2 
gene are prepared using standard procedures known in the art. Preferably, a construct comprising the 
variant gene is introduced into a nonhuman animal or an ancestor of the animal at an embryonic stage, 
Le.. the. one-cell stage, or generally not later than about the eight-cell stage. Transgenic animals 
carrying the constructs of the invention can be made by several methods known to those having skill in 
the art. One method involves transfecting into the embryo a retrovirus constructed to contain one or 
more insulator elements, a gene or genes of mterest, and other components known to those skilled in 
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U.S. Patent No, 5,610,053. Another method involves directly injecting a transgene into the embryo. 
A third method involves the xise of embryonic stem ceils. Examples of animals into which the 
SLC26A2 isogenes may be introduced include, but are not limited to, mice, rats, other rodents, and 
nonhuman primates (see "The Introduction of Foreign Genes iiito Mice" and the cited references 
therein. In: Recombinant DNA, Eds. J J). Watson, M. Gihnan, J. Witieowsid, and M. Zollcn W.H, 
Freeman and Company, New York, pages 254-272). Transgenic ammab stably expressing a human 
SLC26 A2 isogene and producmg human SLC26A2 protein can be used as biological models for 
studying diseases related to abnormal SLC26A2 «q>ression and/or activity, and for screening and 
assaying various candidate drugs, compoimds, and treatment regimens to reduce the symptcnns or 
effects of these diseases. 

An additional embodiment of the invention relates to pharmaceutical compositions for treating 
disorders affected by expression or function of a novel SLC26A2 isogene described herein. The 
pharmaceutical composition may comprise any of the following active ingredients: a polynucleotide 
comprising one of these novel SLC26A2 isogenes; an antisense oUgonucleotide directed agamst one of 
the novel SLC26A2 isogenes, a pol>'nucleotidc encoding such an antisense oligonucleotide, or another 
compound which inhibits expression of a novel SLC26A2 isogene described herein. Preferably, the 
composition contains the active ingredient in a therapeutically effective amount By therapeutically 
effective amount is meant that one or more of tiie symptoms relatmg to disorders affected by 
expression or function of a novel SLC26A2 isogene is reduced and/or eliminated. The composition 
also comprises a pharmaceutically acceptable carrier, examples of which include, but are not limited 
to, saHne, buffered salme, dextrose, and water. Those skilled m the art may employ a formulation 
most suitable for the active ingredient, whether it is a polynucleotide, oligonucleotide, protein, peptide 
or small molecule antagonist. The pharmaceutical composition may be administered alone or in 
combination with at least one other agent, such as a stabihzing compound. Admmistration of the 
pharmaceutical composition may be by any number of routes including, but not limited to oral, 
intravenous, mtramuscular, intra-arterial, intramedullary, intrathecal, intraventricular, intradennal, 
transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or rectal. Further 
details on techniques for formulation and administration may be found in the latest edition of 
Remington's Phannaceutical Sciences (Maack Publishing Co., Easton, PA). 

For any composition, determination of the therapeutically effective dose of active ingredient 
and/or the appropriate route of administration is well within the capability of those skilled in the art 
For example, the dose can be estimated initially either m cell culture assays or m animal models. The 
animal model may also be used to determine the appropriate concentration range and route of 
administration. Such information can then be used to determine useful doses and routes for 
administration in humans. The exact dosage will be detenmned by the practitioner, in Ught of factors 
relating to the patient requiring treatment, including but not limited to severity of the disease state, 
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general health, age, weight and gender of the patient, diet, time and frequency of administration, other 

drugs bemg taken by the patient, and tolerance/response to the treatment 

Any or all analytical and mathematical operations involved in practicing the methods of the 
present invention may be implemented by a computer. In addition, the computer may execute a 
program that generates views (or screens) displayed on a display device and with which the user can 
interact to view and analyze large amounts of information relating to the SLC26A2 gene and its 
genomic variation, including chromosome location, gene structure, and gene femily, gene e^qircssion 
data, polymorphism data, genetic sequence data, and clinical data population data (e.g., data on 
ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations). 
10. The SLC26A2 polymorphism data described herein may be stored as part of a relational database (e.g., 
an mstance of an Oracle database or a set of ASCH flat files). These polymorphism data may be 
• stored on the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more 
other storage devices accessible by the computer. For example, the data may be stored on one or more 
databases in communication with the computer via a network. 
1 5 Preferred embodiments of the invention are described in the following examples. Other 

embodiments within .the scope of the claims herein wiU be apparent to one skilled in the art from . 
consideration of the specification or practice of the invention as disclosed herein. It is intended that 
the specification, together with the examples, be considered exemplary only, with the scope and spirit 
of the invention being indicated by the claims which follow the examples. 

20 

EXAMPLES 

The Examples herein are meant to exemplify the various aspects of carrying out the invention 
and are not intended to limit the scope of the invention in any way. The Examples do not include 
detailed descriptions for conventional methods employed, such as in the performance of genomic 
25 DNA isolation, PGR and sequencing procedures. Such methods are well-known to those skilled in the 
art and are described in numerous publicationsi for example, Sambrook, Fritsch. and Maniatis, • 
"Molecular Qoning: A Laboratory Manual". 2"^ Edition, Cold Spring Harbor Laboratory Press, USA, 
(1989). 

30 EXAMPLE! 

This example illustrates examination of various regions of the SLC26A2 gene for 
polymorphic sites. 

Amplification of Target Regions 
35 The following target regions of the SLC26A2 gene were amplified using PCR primer pairs. 

The primers used for each region are represented below by providing the nucleotide positions of their 

initial and final nucleotides, which correspond to positions in Figure 1 . 
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10 



15 



20 



25 



PGR Primer Psics 

Fragment No. Forward Primer Reverse Primer PGR Product 

Fragment 1 13571 1-135734 complement of 136380-136401 691 nt 

Fragment 2 136045-136069 complement of 136680-136702 658 nt 

Fragmait3 136095-136117 complement of 136680-136702 608 nt 

Fragment 4 136312-136336 complement of 136933-136954 643 nt 

Fragments 136671-136694 complement of 137333-137355 685 nt 

Fragment 6 138745^138768 complement of 139392-139413 669 nt 

Fragment 7 139104-139125 complement of 139789-139810 707 nt 

Fragments 139319-139341 complement of 139951-139972 654 nt 

Fragment 9 139641-139662 complement of 140203-140224 584 nt 

Fragment 10 139917-139937 complement of 140471-140494 578 nt 

Fragment 11 140118-140138 complement of 140849-140869 752 nt • • 

These primer pairs were used in PGR reactions containing genomic DNA isolated from 
immortalized cell lines for each member of the Index Repository. The PGR reactions were carried out 
under the following conditions: 



Reaction volume 

10 X Advantage 2 Polymerase reaction buffer (Qontech) 

1 00 ng of human genomic DNA 

lOmMdNTP 

Advantage 2 Polymerase enzyme mix (Clontech) 
Forward Primer (10 jiM) 
Reverse Primer (10 pM) 
Water 

Amplication profile: 
9T'G - 2 min. 1 cycle 



30 97°C-15sec. 
70^C - 45 sec. 
72**G-45sec. 



10 cycles 



»10jil 

= l^l 
= 1^1 

= 0.4^1 
= 0.2^1 
= 0.4^1 
« 0.4^1 
= 6.6^1 



35 



97''C - 15 sec. 
64"G-45 sec. 
72**G-45 sec. 



} 



35 cycles 



Sequencing o f PGR Products 

40 The PGR products were purified using a Whatman/Polyfiltronics 1 00 ^1 384 well unifilter 

plate essentially according to the manufacture protocol. The purified DNA was eluted in 50 |il of ^ 
distilled water. Sequencing reactions were set up using Applied Biosystems Big Dye Terminator 
chemistry essentially according to the manufacturers protocol. The purified PGR products were 
sequenced in both directions using the primer sets described previously or ibose represented below by 

45 the nucleotide positions of their initial and final nucleotides, which correspond to positions in Figure 1. 
Reaction products were purified by isopropanol precipitation, and run on an Applied Biosystems 3700 
DNA Analyzer. 
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Seqaencmff Primer Paiis 

Fragment No. Forward Primer Reverse Primer 

Fragment 1 135781-135798 complement of 136316-136336 

Fragment 2 13610M36122 complement of 136631-136650 

Fragment 3 136159-136180 complement of 136638-136657 

Fragment 4 136426-136447 complement of 136905-136925 

Fragment 5 136717-136736 complement of 137236-137255 

Fragment 6 138789-138808 complement of 139315-139335 

Fragment 7 139135-139155 complement of 139635-139656 

Fragment 8 139389-139408 complement of 139913-139932 

Fragment 9 139671-139691 complement of 140126-140146 

Fragment 10 139949-139968 complement of 140427-140448 

Fragment 1 1 14021 1-140230 complement of 140714-140735 



Analysis of Sequences for Polymorphic Sites 

Sequence information for a Tnifiimum of 80 hmnans was analyzed for the presence of - 
polymorphisms using the Polyphred program (Nickerson et al., Nucleic Acids Res. 14:2745-275 1, 
1 997). The presence of a polymorphism was confirmed on both strands. The polymorphisms and their 
locations in the SLC26A2 gene are listed in Table 3 below. 

Table 3. Polymorphic Sites Identified in the SLC26A2 Gene 



Polymorphic 




Nucleotide 


Reference 


Variant 


CDS Variant 


AA 


Site Number 


Polyld* 


Position** 


AUele 


Allele 


Position 


Variant 


PSl 


3759084 


136098 


G 


A 






PS2 


3759090 


136195 


A 


G 






PS3 


3759094 


139338 


T 


A 


1046 


F349Y 


PS4* 


3759102 


140013 


C 


T 


1721 


T574I 


PS5 


3759106 


140357 


A 


T 


2065 


T689S 



*l*ositions of polymorphic sites in Figure 1. 
^reyiously reported in the literature 

EXAM]PLE2 

This example illustrates analysis of the SLC26A2 polymorphisms identified in the Index 
Repository for human genotypes and haplotypes. 

The different genotypes containing these polymorphisms that were observed in the reference 
population are shown in Table 4 below, with the haplotype pair indicating the combination of 
haplotypes determined for the individual using the haplotype derivation protocol described below. In 
Table 4, homozygous positions are indicated by one nucleotide and heterozygous positions are 
indicated by two nucleotides. Missing nucleotides in any given genotype in Table 4 were inferred 
based on linkage disequilibrium and/ or Mendelian inheritance. 
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Table 4- Genotypes and Hap lo type Pairs Observed for SLC26A2 Gene 



Genotype Polymorphic Sites 





Number 


PSl 


PS2 


PS3 


PS4 


PS 5 


HAP 


Pair 




1 


G 


A 


T • 


c . 


A 


3 


3 . 


5 


2 


A 


G 


T 


c 


T 


2 


2 




3 


A/G 


G/A 


T 


C/T 


T/A 


2 


5 




4 


G 


A 


T 


C 


A/T 


3 


4 




5 


G/A 


A/G 


T/A 


c 


A/T 


3 


1 




6 


G 


A. 


T 


C/T 


A 


3 


5 


10 


7 • 


G/A 


A/G 


T 


C 


A/T 


3 


2 



The haplotype pairs skown in Table 4 were estimated from the unphased genotypes using a 
computer-implemCTited extension of Clark's algorithm (Clark, A.G. 1 990 Mol Bio Evol 7, 11 1-122) for • 
assigning haplotypes to unrelated individuals in a population sample, as described in U.S; Provisional 

15 Application Serial No. 60/198,340 entitled Method and System for Deteimining Haplotypes from a 
Collection ofPolymotplusms" and the corresponding In 
this method, haplotypes are assigned directly from individuals who are homozygous at all sites or 
heterozygous at no more than one of the variable sites. This list of haplotypes is then used to 
deconvolute the uiqihased genotypes in the remaining (multiply heterozygous) individuals. In our 

20 analysis, the list of haplotypes was augmented with haplotypes obtained from two families (one three- 
generation Caucasian family and one two-generation African- American femily). 

By foUo^^ing this protocol, it was determined that the Index Repository examined herein and, 
by extension, the general population contains the 5 human SLC26A2 haplotypes shown in Table 5 
below. 

25 An SLC26A2 isogene defined by a full-haplotype shown in Table 5 below comprises the 

regions of the SEQ ID NOS indicated in Table 5, with their corresponding set of polymorphic 
locations and identities, which ane also set forth in Table 5. 

Table 5. Haplotypes Identified in the SLC26A2 Gene 



30 



Haplotype Number" 


PS • 
No*" 


PS 

Position*^ • 


SEQ 
ID NO^ 


Region 
Examined® 


1 


2 


3 


4 


5 


A 


A 


G 


G 


G 


1 


3798. / 30 


24/25 


3411-5055 


G 


G 


A 


A 


A 


2 


3895 y 150 


24/25 


3411-5055 


A 


T 


T 


T 


T 


3 


7038 1 270 


24/25 


6445-8569 


C 


C 


C 


C 


T 


4 


7713 1 390 


24/25 


6445-8569 


T 


T 


A 


T 


• A 


5 


8057 1 510 


24/25 


6445-8569 



^Alleles for SLC26A2 haplotypes are presented 5 ' to 3 ' in each column 
''PS = polymorphic site; 

T^osition of PS within tiie indicated SEQ ID NO, with the Imposition nimiber referring to the 
first SEQ ID NO and tiie 2°^ position number referring to the 2*^ SEQ ID NO; 
* 35 SEQ ID NO refers to Figure 1 , with the two alternative allelic variants of each 

polymorphic site indicated by the appropriate nucleotide symbol; 2"** SEQ ID NO is a 
modified vasion of the 1*^ SEQ ID NO that comprises the context sequence of each 
polymorphic site, PS1-PS5, to &cilitate electronic searching of the haplotypes; 
^gion examined represents the nucleotide positions defining the start aiui stop positions 
40 within the I'' SEQ ID NO of the sequenced region. 
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SEQ n> NO:24 refers to Figure 1 , wiHh the two altexnative allelic variants of each polymorphic 
site indicated by the ^propriate nucleotide symbol. SEQ S) NO:25 is a modified vemon of SEQ CD 
NO:24 that shows the context sequence of each of PSl-PSS in annifonn format to &ci]itate electronic 

5 searching of the SLC26A2 haplotypes. For each polymoxphic site, SEQ ID NO:25 contains a block of 
60 bases of the nucleotide sequence encompassing the centrally-located polymorphic site at the 30* 
position, followed by 60 bases of unspecified sequence to represent that each polymoqshic site is 
separated by genomic sequence whose composition is defined elsewhere herein. 

Table 6 below shows the percent of chromosomes characterized by a given SLC26A2 

10 haplotype for aUunrdated individuals in the Index Rq>ository for which haplotype data was obtm 
The percent of these unrelated individuals who have a given SLC26A2 baplotype pair is shown in 
Table 7. In Tables 6 and 7, the 'Total'' column shows this frequency data for all of these unrelated 
individuals, while the other columns show the frequency data for these unrelated individuals 
categorized according to their self-identified ethnogeographic origiiL Abbreviations used in Tables 6 

15 and 7 are AF = African Descent, AS = Asian, CA = Caucasian, HL = Ifispanic-Latino, and AM = 
Native Americanu 

Table 6. Frequency of Observed SLC26A2 Haplotypes In Unrelated Individuals 



HAP HAP 



No. 


ro 


Total 


CA 


AF 


AS 


HL 


AM 


1 


3760324 


0.61 


2.38 


0.0 


0.0 


0.0 


0.0 


2 


3760318 


21.95 


19.05 


37.5 


10.0 


19.44 


33.33 


3 


3760317 


75.0 


76.19 


60.0 


90.0 


75.0 


66.67 


4 


3760323 


0.61 


0.0 


2.5 


0.0 


0.0 


0.0 


5 ■ 


3760322 


1.83 


2.38 


0.0 


0.0 


5.56 


0.0 



Table 7. Frequency of Observed SLC26A2 Haplotype Pairs In Unrelated Individuals 



30 


HAPl 


HAP2 


Total 


CA 


AF 


AS 


HL 


AM 




3 


•3 


56.1 


57.14 : 


30.0 


80.0 


61.11 


33.33 




2 


2 


4.88 


4.76 


10.0 


0.0 


5.56 


0.0 




2 


5 


1.22 


6.0 


0.0 


0.0 


5.56 


0.0 




3 


4 . 


1.22 


0.0 


5.0 


0.0 


0.0 


0.0 


35 


3 


1 


1.22 


4.76 


0.0 


0.0 


0,0 


0.0 




3 


5 


2.44 


4.76 


0.0 


0.0 


5.56 


0.0 




3 


2 


32.93 


28.57 


55.0 


20.0 


22.22 


66.67 



The size and composition of the Index Repository were chosen to represent the genetic 
40 diversity across and witbru four major population groups comprising the general United States 

populatioiL For example, as described in Table 1 above, this rq)ositoiy contains approximately equal 
' sample sizes of African-descent, Asian- American, European- American, and Hispanic-Latino 

population groups. Almost all individuals representing each group had all four grandparents with the 
same ethnogeographic background. The number of unrelated individuals in the Index Repository 
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provides a sample size that is sufficient to detect SNPs andhaplotypes that occur in the general 
popuhttion with high statistical certainty. For instance, a haplotype that occurs with a frequency of 5% 
in the general population has a probability higher than 99.9% of being .observed in a sample of 80 
individuals from the general population. Similarly, a haplotype that occurs with a frequency of 10% in 

5 . a specific population group has a 99% probability of being observed in a sample of 20 individuals 
from that population group. In addition, the size and composition of the Index Repository means that 
the relative frequencies determined therein for the haplotypes and haplotype pairs of the SLC26A2 
gene are likely to be similar to the relative fieqaoicies of these SLC26A2 haplotypes and haplotype 
pairs in the general U.S. population and in the four population groups rq>res^ed in the Index 

10 Repository. The genetic diversity observed for the three Native Americans is presented because it is 
of scientific interest, but due to the small sample size it lacks statistical significanice. 

In view of the above, it will be seen that the several advantages of the invention are acMeved 
and other advantageous results attained. 

As various changes could be made in the above methods and compositions without departing 

15 from the scope of the invention, it is intended that all matter contained in the above description and 
shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense. 

All references cited in this specification, including patents and patent applications, are hereby 
incorporated in their entirety by reference. The discussion of references hwein is intended merely to 
summarize the assertions made by their authors and no admission is made that any reference 

20 constitutes prior art. Applicants reserve the right to challenge the accuracy and pertinency of the cited 
references. 
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What is naimed is: 

1. A method for haplotyping the solute carrier family 26, member 2 (SLC26A2) geae of an 
individual, which comprises determining which of the SLC26A2 haplotypes shown in the 
table immediately below defines one copy of the individual's SLC26A2 gene, wherein each of 
the SLC26A2 haplotypes comprises a set of polymorphisms whose locations and identities are 



set forth in the table immediately below: 



Haplotype Number* 


PS 


PS 


1 


2 


3 


4 


5 


No** 


Position" 


A 


A 


G 


G 


G 


1 


3798 


G 


G 


A 


A 


A 


2 


3895 


A 


T 


T 


T- 


T 


3 


7038 


C 


C 


C 


C 


T 


4 


7713 


T 


T 


A 


T 


A 


. 5 


8057 



'Alleles for h^lotypes are presented 5' to 3' in each column 

^1*8 = polymorphic site; 

T>osition of PS within SEQ ID NO:24. 



2. The method of claim 1 , wherein the detemmiing step comprises identifying the phased 
sequence of nucleotides present at each of PSi-PS5 on the one copy of the individual's 
SLC26A2 gene. 

3 . A method for haplotyping the solute carrier family 26, member 2 (SLC26A2) gene of an 
individual, which comprises determining which of the SLC26A2 haplotype pairs shown in the 
table immediately below defines both copies of the individual's SLC26A2 gene, wherein each 
of the SLC26A2 h^lotype pairs consists of first and second haplotypes which comprise first 
and second sets of polymorphisms whose locations and identities are set forth in the table 
immediately below: 



Haplotype Pairs' 


PS 
No^ 


PS 
Position*^ 


3/3 


2/2 


2/5 


3/4 


3/1 


3/5 


3/2 


G/G 


• A/A 


A/G • 


G/G 


G/A 


G/G 


G/A 


1 


3798 


AJA 


G/G 


G/A 


A/A 


A/G 


A/A 


A/G 


2 


3895 


T/T 


TfT 


T/T 


T/T 


T/A 


T/T 


T/T 


3 


7038 


C/C 


C/C 


CfT 


C/C 


C/C 


OT 


C/C 


4 


7713 


A/A. 


T/T 


T/A 


AH- 


AfT 


A/A 


ATT 


5 


8057 



"Haplotype pairs are represented as 1* haplotype/2"^ haplotype; with alleles of each h^lotype 
shown 5 ' to 3' as I" pol>Tnorphism/2"* polymorphism in each column; 
**PS = polymorphic site; 
^Position of PS in SEQ ID NO:24. 

4. The method of claim 3, wherein the determining step comprises identifying the phased 
sequence of nucleotides present at each of PS1-PS5 on both copies of the individual's 
SLC26A2 gene. 

5 . A method for genotyping the solute carrier family 26, member 2 (SLC26 A2) gene of an . 
individual, comprising determining for the two copies of the SLC26 A2 gene present in the 

37 



wo 01/98318 



PCT/USOl/20028 



individud the identity ofthenxicleotidfipak at one or more ' 
from the group consisting of PSl, PS2, PS3 and PS5, wherein the one or more PS have the 
location and alternative alleles shown in SEQ ID NO:24. 

6. The method of claim 5, wherein the detennimng step comprises: 

(a) isolating from the individual a nucleic acid mixture comprising both copies of the 
SLC26A2 gene, or a fragment thereof, that are present in the individual; 

(b) amplifying from the nucleic acid mixture a target region containing the s^ 
polymorphic site; 

(c) hybridizing a primer ^ctension oligonucleotide to one allele of the amplifred target 
region; 

(d) performing a nucleic add template-dependent,.primer extension reaction on the 
hybridized genotyping oligonucleotide in the presence of at least two different 
terminators of the reaction, wherein said terminators are complementary to the alternative 
nucleotides present at the selected polymorphic site; and 

(e) detecting the presence and identity of the terminator in the extended genotyping 
oligonucleotide. 

7. The mediod of claim 5, which comprises determining for the two copies of the SLC26A2 gene 
present in the individual the identity of the nucleotide pair at each of PS 1-PS5. 

8. A method for ha^lotyping the solute carrier femily 26, member 2 (SLC26A2) gene of an 
individual which comprises determining, for one copy of the SLC26A2 gene present in the 
individual, the identity of the nucleotide at two or more polymorphic sites (PS) selected from 
the group consisting of PSl, PS2, PS3 andPSS, wherein the selected PS have the location and 
alternative aUeles shown in SEQ ID NO:24. 

9. The method of claim 8 , further comprising determining the identity of the nucleotide at PS4, 
which has the location and alternative alleles shown in SEQ ID NO:24. 

10. The method of claim 8, wherein the determirung step comprises: 

(a) isolating from the individual a nucleic acid sample containing only one of the two copies 
of the SLC26A2 gene, or a fragment thereof, that is present in the individual; 

(b) amplifying from the nucleic acid sample a target region containing the selected 
polymorphic site; 

(c) hybridizing a primer extension oligonucleotide to one allele of the amplified target region; 

(d) performing a nucleic add template-dependent, primer extension reaction on the 
hybridized genotyping oligonucleotide in the presence of at least two different 
terminators of the reaction, wherein said terminators are complementary to the alternative 
nucleotides present at the selected polymorphic site; and 

(e) detecting the presence and identity of the terminator in the extended genotyping 
oligonucleotide. 
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11. A method for predicting a iK^lotype pair for the solute carrier femily 26, member 2 (SLC26A2) 

gene of an individual comprising: 

(a) identifying a SLC26A2 genotype for the individual, wherein the genotype comprises the 
nucleotide pair at two or more polymorphic sites (PS) selected from the group consisting 
of PS 1 , PS2, PS3 and PS5, wherein the selected PS have the location and alternative 
alleles shown in SEQ ID NO:24; 

(b) enumerating all possible haplotype pairs which are consistent with the genotype; 

(c) comparing the possible haplotype pairs to the haplotype pair data set forth in the table 
immediately below; and . 

(d) assigning a haplotype pair to the individual that is consistent with the data 



Haplotype Pairs* 


PS 
No** 


PS 

Position^ 


3/3 


2/2 


2/5 


3/4 


3/1 


3/5 


3/2 


G/G 


A/A 


A/G 


G/G 


G/A 


G/G 


G/A 


1 


3798 


A/A 


G/G 


G/A 


A/A 


A/G 


A/A 


A/G 


2 


3895 


TH" 


T/T 


TfT 


TfT 


T/A 


T/T 


T/T 


3 


7038 


C/C 


C/C 


C/T 


C/C 


C/C 


C/T 


C/C 


4 


7713 


A/A 


T/T 


T/A 


ATT 


AH" 


A/A 


ATT 


6 


8057 



^Haplotype pairs are represented as 1'* haplotype/2"^ haplotype; with alleles of each haplotype 
shown 5 ' to 3 ' as polyniorphism/2"* polymorphism in each column; 
'^S = polymorphic site; 
^Position of PS in SEQ ED NO:24. 

12. The method of claim 1 1 , wherein the identified genotype of the individual comprises the 
nucleotide pair at each of PS1-PS5, which have the location and alternative alleles shown m 
SEQIDNO:24. 

13. A method for identifying an association between a trait and at least one haplotype or haplotype 
pair of the solute carrier family 26, member 2 (SLC26A2) gene which comprises comparing the 
frequency of the haplotype or haplotype pair in a population exhibiting the trait with the 
frequency of the haplotype or haplotype pair in a reference population, wherein the haplotype is 
selected from haplotypes 1-5 shown m the table presented immediately below, wherein each of 
the haplotypes comprises a set of polymorphisms whose locations and identities arc s^t forth in 
the table immediately below: 



Haplotype Number^ 


PS 


PS 


1 


2 


3 


4 


5 


No** 


Position'' 


A 


A 


G 


G 


G 


1 


3798 


G 


G 


A 


A 


A 


.2 


3895 


A 


T 


T 


T 


T 


3 


7038 


C 


C 


C 


C 


• T 


4 


7713 


T 


T 


A 


T 


A 


5 


8057 



^Alleles for haplotypes are presented 5 ' to 3' in each colunm 
"TPS = polymorphic site; 
^Position of PS in SEQ ED NO:24; 



and wherein tiie haplotype pair is selected from the haplotype pairs shown in the table 
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immediately below, wherein each of the SLC26A2 haplotype pairs consists of first and second 
haplotypes which comprise first and second sets of polymorphisms whose locations and 
identities are set forth in the table immediately below: 



20 



Haplotype Pairs" 


PS 


PS 


3/3 


2/2 


2/5 


3/4 


3/1 


3/5 


3/2 


No** 


Position*^ 


G/G 


A/A 


A/G 


G/G 


G/A 


G/G 


G/A 


1 


3798 


A/A 


G/G 


G/A 


A/A 


A/G 


A/A 


A/G 


2 


3895 


TfT 


T/T 


T/T 


T/T 


T/A 


T/T 


T/T 


3 


7038 


C/C 


C/C 


err 


C/C 


C/C 


C/T 


C/C 


4 


7713 


A/A 


T/T 


T/A 


A/T 


AAT 


A/A 


AH- 


5 


8057 



"Haplotype pairs are represented as l" haplotype/2'^ haplotype; with alleles of each haplotype . 
shown 5 ' to 3 ' as 1^ polymorpMsm/2°^ polymorpMsm in each column; 

= polymorphic site; 

'Position of PS in SEQ ID NO:24; 

25 

wherein a higher frequency of the haplotype or haplotype pair in tise trait population than in the 
reference population mdicates the trait is associated with the haplotype or haplotype pair. 

14. The method of claim 13, wherein the trait is a clinical response to a dnig targeting SLC26A2. 

15. An isolated genotyping oligonucleotide for detecting a polymorphism in the solute carrier 
family 26, member 2 (SLC26A2) gene at a polymorphic site (PS) selected from the group 
consisting of PSl, PS2, PS3 and PS5, wherein the selected PS have the location and alternative 
alleles shown in SEQ ID NO:24. 

16. The isolated genotyping ohgonucleotide of claim 15, which is an allele-specific oligonucleotide 
that specifically hybridizes to an allele of the SLC26A2 gene at a region containing the 
polymorphic site. 

17. The allele-specific oligonucleotide of claim 16, which comprises a nucleotide sequence selected 
from the group consisting of SEQ ID NOS:4.7, the complements of SEQ ID NOS;4-7, and SEQ 
IDNOS:8-15. . 

1 8 . The isolated genotyping oligonucleotide of claim 15, which is a primer-extension 
oligonucleotide. 

1 9. The pribner-extension oligonucleotide of claim 18,which comprises a nucleotijde sequence 
selected fi-om the group consisting of SEQ ID NOS: 16-23. 

20. - A kit for genotyping the solute carrier family 26, member 2 (SLC26A2) gene of an individual, 

♦ which comprises a set of xjligonucleotides designed to genotype each of polymoiphic sites (PS) 
PS 1, PS2, PS3 and PS5, wherein the selected PS have the location and alternative alleles shown 
inSEQIDNO:24. 

21. The kit of claim 20, which further comprises oligonucleotides designed to genotype PS4, 
having the location and alternative alleles shown in SEQ ID NO:24. 

22. An isolated polynucleotide comprising a nucleotide sequence selected fix>m the group consisting 
of: 
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(a) a fct nucleotide sequence wWch comprises a solute cani 

(SLC26A2) isogene, wherein the SLC26A2 isogene is selected from the group consisting 
of isogenes 1- 2 and 4 - 5 shown in the table immediately below and wherein each of the 
isogenes comprises the regions of the SEQ ID NOS shown in the table immediately below 
and wherein each of the isogenes 1- 2 and 4 - 5 is further defined by the coxrespondiiig set 
of polymorphisms whose locations and identities are set forth in the table inunediately 
below 



Isogene Number^ 


PS 

No^ 


PS 

Position^ 


Region 

Examined** 


1 


2 


3 


4 


5 


A 


A 


G 


G 


G 


1 


3798 


3411-5055 


G 


G 


A 


A 


A 


2 


3895 


3411-5055 


A 


T 


T 


T 


T 


3 


7038 


6445-8569 


C 


C 


C 


C 


T 


4 


7713 


6445^569 


T 


T- 


A 


T 


A 


5 


8057 


6445^69 



"Alleles for isogenes are presented 5 ! to 3 ' in each column 

« polymorphic site; 
''Position of PS in SEQ ED NO:24; 

'^Region, examined represents the nucleotide positions defining the start arid stop positions 
of the sequenced region within SEQ ID NO:24; 

(b) a second nucleotide sequence which comprises a fragment of the first nucleotide 
sequence, wherein the fragment comprises one or more polymorphisms selected from the 
group consisting of adenine at PS 1 , guanine at PS2, adenine at PS3 and thymine at PS5, 
wherein the selected polymorphism has the location set forth m the table immediately 
above; and 

(c) a third nucleotide sequence which is complementary to the first or second nucleotide 
sequence. 

23. The isolated polynucleotide of claim 22, which is a DNA molecule and comprises both the first 
and third nucleotide sequences and fiirther comprises expression regulatory elements opcrably 
linked to the first nucleotide sequence. 

24. A recombinant nonhuman organism transformed or transfected with the isolated polyniacleotide 
of claim 22, wherein the organism expresses a SLC26A2 protein encoded by the first nucleotide 
sequence. 

25. The recombinant nonhuman organism of claim 24, which is a transgenic animal. 

26. The isolated polynucleotide ofclaim 22 which coiisists of the sccorid nucleotide sequence. 

27. An isolated polynucleotide comprising a nucleotide sequence selected from the group 
consisting of: 

(a) a coding sequence for a solute carrier family 26, member 2 (SLC26A2) isogene wherem 
the coding sequence is defined by a haplotypc selected from the group consisting of 1, 2, 4 
and 5 shown in the table immediately below and wherein the coding sequence comprises 
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SEQ ID N0:2 except at eadi of the polymorphic sites which have the locations and^ - 
polymorphisms set forth in the table immediately below: 



Coding Seo 


uence Haploty 


30 Number" 


PS 
No** 


PS 
Position^ 


1c 


2c. 4c 


5c 


A 


T 


■ T 


3 


7038 


C 


C 


T 


4 


7713 


T 


T 


A 


5 


8057 



Alleles for coding sequence haplotypes are presented 5' to 3' in each colurrm; the 
numerical portion of the coding sequence haplotype number represents the number of the 
parent full SLC26A2 haplotype; 
*T?S = polymorphic site;. 
T>osition of PS in SEQ ID N0:2; 

and 

(b) a fiagment of the codrag sequence, wherein the fragment comprises at least one 
polymorphism selected from the group consisting of adenine at a position corresponding to 
nucleotide 1046 arid thymine at a position corresponding to nucleotide 2065, wherein said 
positions in the coding sequence and the fragment refer to SEQ ID NO:2. 

28. A recombinant nonhuman organism transformed or transfected with the isolated polynucleotide 
of claim 27, wherein the organism expresses a solute carrier femily 26, member 2 (SLC26A2) 
protein encoded by the polymorphic variant sequence. 

29. The recombinant nonhuman organism of claim 28, which is a transgenic animal. 

30. An isolated polypeptide comprising an amino acid sequence which is a polymorphic variant of a 
reference sequence for the solute carrier family 26, member 2 (SLC26A2) protein or a fragment 
thereof; wherein the reference sequence comprises SEQ ID N0:3 and the polymorphic variant 
comprises one or more variant amino acids selected from the group consisting of tyrosine at a 
position corresponding to amino acid position 349 and serine at a position corresponding to 
amino acid position 689. 

31. An isolated monoclonal antibody specific for and immunoreactive with the isolated polypeptide 
of claim 30. 

32. A method for screening for drugs targeting the isolated polypeptide of claim 30 which 
comprises contacting the SLC26A2 polymorphic variant with a candidate agent and assaying for 
binding activity. 

' 33. A computer system for storing and analyzing polymorphism data for the solute carrier family 
26, member 2 gene, comprising: 

(a) a central processing unit (CPU); 

(b) a communication interface; 

(c) a display device; 

(d) an input device; and 

(e) a database coiitaining the polymorphism data; 
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wherein the polymorphism data comprises the haplotypes set forth in the table immediately 
below: 



Haplotype Number* 


PS 


PS 


1 


2 


3 


4 


5 


No** 


Position® 


A 


A 


G 


G 


G 


1 


3798 


G 


G 


A 


A 


A 


2 


3895 


A 


T 


T 


T 


T 


3 


7038 


C 


C 


C 


C 


T ■ 


4 


7713 


T 


T 


A 


T 


A 


5 


8057 



'Alleles for haplotypes are presented 5' to 3' in each column 
**PS = polymorphic site; 
^Position of PS in SEQ ID NO:24; 



and the haplotype pairs set forth in the table immediately below:- 



Haplotype Pairs* 


PS 

No** 


PS 

Position*^ 


3/3 


212 


2/5 


3/4 


3/1 


3/5 


3/2 


G/G 


A/A 


A/G 


G/G 


G/A 


G/G 


G/A 


1 


3798 


A/A 


G/G 


G/A 


A/A 


A/G 


A/A 


A/G 


2 


3895 


T/T 


T/T 


TH" 


T/T 


T/A 


T/T 


T/T 


3 


7038 


C/C 


C/C 


C/T 


C/C 


. C/C 


C/T 


C/C 


4 


7713 


A/A 


T/T 


T/A 


AH- 


AH- 


/VA 


AH" 


5 


8057 



^Haplotype pairs are represented as 1*^ Haplotype/2"* Haplotype; with alleles of each haplotype 
shown 5' to 3' as I'' polymorphism/2"*^ polymorphism in each colimm; 
T:.ocation of PS in SEQ ED NO:24. 
34. A genome anthology for the solute carrier family 26, member 2 (SLC26A2) gene which 

• comprises SLC26A2 isogenes defined by any one of haplotypes 1-5 set forth in the table shown 
below: 



Haplotype Number* 


PS 


PS 


1 


2 


3 


4 


5 


No** 


Position*' 


A 


A 


G 


G 


G 


1 


3798 


G 


G 


A 


A 


A 


2 


3895 


A 


T 


T 


T 


T 


3 


7038 


C 


C 


C 


. C 


. T 


4 


7713 


T 


T 


A 


T 


A 


5 


8067 



•Alleles for haplotypes are presented 5' to 3 ' in each colunm 
'^PS = polymorphic site; 
"Position of PS in SEQ ID NO:24. 
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POLYKORPHISMS IN THE SLC26A2 GENE 



GTGTTTATCT 
AGACAGAGTC 
GGCTTATTGC 
CCTCCCAAGT 
ATCTTTTAAA 
TTTCTGGGGT 
AGGCCAGGGT 
GGGAGCTTAG 
CCAGGCTTGG 
TGGGAGAATC 
GCATGTCACT 
AAAAGATATA 
CCTCCATGAT. 
GTATAAGTAA 
TTTGTTTTCA 
TCTTTTTGTT 
GATAGATTAT 
TGCTTTGTCA 
CCTCCGCCTC 
CTGGGACTAC 
TAGAAGAGAC 
ACCTTGTGAT 
TGAGCCACTG 
TTTCCTCTTG 
GCAACCTCCA 
GTAGCTGGGA 
TTAGTAGAGA 
GGCCTCAGGT 
AATTTCTTGG 
TTCTTTTGGT 
CAGCTCTTGG 
GTTTCGTTAT 
TCTGTCTTTT 
TGATCCTATC 
AGCATTCCTT 
TTCTATGTGA 
CAAGATTTGG 
GGTTTGGAAA 
GGACGTTGGT 
GTCACAGACT 
AACCTGATTG 
TTCCTTATCT 
AAACAAAATG 
TCATAGGTCC 
CACAAGGGCC 
CTTCATCTAA 
ATCCCAGCAC 
TTCGAGACCA 
CAAAAATTAG 



GAGAAAGTTT 
TCTCCCTGTC 
AGCCTCAACC 
AGCTGGGACC 
ATATTTTTTC 
CAGATGTGGT 
GGGAGGATCA 
TAGCAAGACC 
TCGTGCACAC 
ACTTGAGCCC 
GCACTTCAGC 
TATTTCTTTG 
TTCCAATAAG. 
TATATGTTTT 
GCAGCTTGAA 
TTTATATTTA 
ATATTATATA 
CCATGCTGGA 
CCAGGTTCAA 
AGGAGCGCAC 
GGGGTTTCAC 
CTGCCTGCCT 
CACCCGGCCT 
CTGTCCAGAC 
GTTCCTGGGT 
TTACAGGCGT 
CCCTGTTTCA 
TACTTGGATC 
CCATTATATC 
ATTCTAATTA 
ATGTTATGTT 
TTAAACTTTT 
TGAGTTCATC 
GAAGGCATTC 
ATTTGATTTT 
ATTTTTATCT 
ATTCCCTTAG 
AACCTTTTCA 
GCACTCAAGC 
GGTTGCATTG 
ATTTTCTCAG 
ACTTCAGAAG 
AATATGGGTT 
TGGATTTTGG 
CATTTTTCTG 
AATATATTGT 
TTTGGGAGGC 
GCCTGACCAA 
CCAGGCGTGG 



TTATTTCTTT 
ACCCAGGCTA 
TCTTAGGCTC 
ATAGGCACAC 
TAGGTAAAGA 
GGCTCACACC 
CTTGAGGCCA 
TTGTCTCTAC 
CTGTAATCCT 
AGGAGTTCGA 
CTGGGCAACA 
•ATACAAAGAT 
AAGTCTGCTA 
TTCTTTGTCT 
TATCATATGT 
CTCTTCTTGG 
TATATGTATT 
GTGCAGTGGC 
GCAATTCTCC 
CACCATGCCC 
CATGTTGGCC 
TGGCCTCCGG 
TTTTTT'^TTT 
AGGAGTGCAA 
TCCAGCGATT 
GTGCCACCAT 
CCATGTTGGT 
TATATTTTGA 
TTGAAATATT 
TGCATATGTT 
ACATTGTTGA 
CTGCGTTTTA 
GATTCTTTCT 
ATCATCTCTG 
TTCTTACAGT 
ATATTCTTCC 
TTTTTTCTTA 
CCCAGAAATC 
TAGCTCTAGA 
TAGAGCAAAG 
GGTCAGTGGT 
ACCATCAAAA 
ACTTTTACAT 
CAGTTTGGTA 
•TCAGTTCCAA 
CTCGGCTGGG 
TGAGGCGGAT 
TATGGTGAAA 
TGGCATGCAC 



TTCATATTTT 
AAGTGCAGTG 
AAGCAGTCTC 
ACCACCATGC 
ATTCTATGTT 
TGTAATCCTA 
AGAGTTTGGG 
AAAAAAAATT 
CGCTACTCAG 
GGCCATGGTG 
GGGCAAGACC 
ACCACTCCAT 
TAATTCTTAT 
TCAGGATTTT 
GTAGGTGAGG 
AATTCTCTAA 
TTTTTTTTGA 
GCGATCTTGG 
TGCCTCAGCC 
AGCTAATTTT 
AGGATGGTCT 
AAGTGTTGGA 
TTTTTTTTTT 
TGGCGTGATC 
CTCCTGCCTC 
GCCTGACTAA 
CATGCTGGTC 
TGTCTCTTAT 
TCTTCTGCCC 
TGATACTTTG 
TATTGGGGTT 
GTTTAGCTAA 
TTGGCTTTCG 
TTACTGTGTT 
TTCCATTTCT 
CAGATAGCAG 
TGTCTCTTTT 
ACCAAGAGCT 
AAGCAGGAAA 
AGCTGCTGGG 
CACCTAAGTC 
TATTTACTCA 
ATCTTGTGTT 
ACGTAAGGTT 
AACAAGAAAC 
TGCAGTGGCT 
GGATCACCTG 
CCCTGTCTCT 
CTGTAATCCC 



ATTTTATTTA 

GTGTGATCAT 
CCTACCTCAG 
CTGGCCTCTT 
AAAAGTTATA 
GCACTTTGGG 
AACCAGCCTG 
AAAAAATGAG 
GAGGCTGAGG 
AGCCATAATT 
CTATTTCTAT 
CATGTTCTGG 
TTTTGTTTCT 
CTCTTTATCT 
GTTTTTGTTT 
GATTCTTGGA 
GACAGAGTTT . 
CTCACTGCAG 
TCCTGAGTAG 
TGTATTTTTT 
TGATCTCTTG 
ATTACAGGTG 
GAGAAGGAGT 
TCGGCTCACT 
GGCCTCGCGA 
TTTTATATTT 
TTGAACTCCT 
TAATTTTGGA 
ATTCTCTCTC 
ATGTGTCCCG 
TTTTTGTTTT 
TTTCTGTTTA 
GGTCTAGTAA 
TTTCACTTTT 
TTGTTTTACT 
CCATACTGGG 
GCATATCTTA 
CTCCAAAATA 
TGACAGGAGA 
TGTATCCATC 
TATTCTTTCT 
TGTTAAACCT 
CTTGAATAAG 
TATAGAACTT 
TTCCTGAGTC 
CGTGCCTGTT 
AGGTTAGGAC 
ACTAAAAATG 
AGCTACTCAG 



132400 

132500 

13*2600 

132700 

132800 

132900 

133000 

133100 

133200 

133300 

133400 

133500 

133600 

133700 

133800 

133900 

134000 

134100 

134200 

134300 

134400 

134500 

134600 

134700 
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GAGGCTGAGA 

AGCTGAGATT 

GTCTCAAAAA 

TTTGTTCATT 

TCAGCGTTAC 

CCTCCAGGAG 

GTACTCTAAT 

TCATTTGGGC 

GTAACTGGGT 

TGGCCATTCG 

AGCATTTGGT' 

TTCTTCTCTT 

CTTTTTCTGG 

AGATCTTATG 

CCTGTATTGC 

ACCTGTGTTT 

ATGTAGGCCC 

ATAGACAAAA 

TCTTCATGCC 

TTCCTACCAA 

TAGTTTCTTT 

TTCTATTTAT 

TAAAACAGGA 

GTTAAGGTAG 

TGAAAAACCT 

AGGCTGCCCT 

CATATATGGA 



TAAGAGAATT 

GCACCACTGC 

AAAAAAAAAA 

TGTTCATTCA 

TACATTCCAT 

CTAGAGAGAT 

GGAGGTGCAA 

TAGGGGAGAT 

TTTAAAAAAT 

AAGCTAAGTA 

AAACAATTTA 

ACTCCGGTTA 

CTATTTTTAC 

GCATTAGTTT 

CTCCAAGTGG 

CAGTGTGTGT 

TGCCCACAGG 

ACCTTCTACA 

CTTTACATTC 

GCAAGGGAGT 

TTATTGCTAA 

TTTCAACATT 

CCATCTTCCT 

GGGTTAAGAC 

GGGAAGAAGA 

TACATCTTTT 

GAAGAATCCT 



GCTTGAACCC 

ACTCCAGCCT 

AATGTCTCTA 

TTCATTCAAA 

TCAGATTACA 

-TCCTACTTCA 

CATGCTATGG 

TGGTTAGTGC 

TATTAGGATC 

AACAGCTTAT 

TGTTCTATTC 

TTAGACTTAC 

CTCCTTTGTT 

TATAGTCTAG 

AAGTGCAGGG 

ACTTAGAATA 

GCAGATGACT 

GCATGTATGA 

AGCACCCTCA 

ACTGTTCAAA 

AAACATTTAT 

TAGCAGGTTG 

TGTTTCAGGG 

CAGATCCTAT 

CCGCTGGTAG 

CAGGAGGAAA 

TAATGGTTTA 



CCCAGCCAGT TATTTGCTTT GACTTGGCTG 
TTTTTTTTTC CCCCTAACCA AGACAAATGA 



ACATAAGATA 
ACAAATAGAA 
AACACTGGTA 
ATGTCTTCAG 
[exon 
TGAAGGAAAT 
AATCAAGTAC 
TATCATAGGA 
GGAGTTTGTT 
AAGCCAAAAA 
AAATACGACC 
TGTGGGCATA 
GCCAAGAACC 
TATTTTCTCT 
ACTGTGCCTT 
GCTATGACAA 
. AGCACATTAT 
TGCAATTATG 



CCTATTCCAA 
TTGTTAGTAT 
TTTTCTCTGG 
AAAGTAAAGA 
1: 136351. 
GACAGTTATC 
TGACTTCAAG. 
TCCTTATTGA 
ATTAAAAAGC 
TATGATTTTA 
TAAAGAAAAA 
TTATTGGTGC 
TGTCTATGGT 
TGGGTACCTC 
ATGATTGGTG 
TGCCCATAGT 
TAAA.TCATAC 
GTTGGCAGCA 
..137 



AACTGAATTC 
ATGTGAGCAC 
TGTAGGAAGC 
GCAACATAAC 

CATCTGGGAT 
CAATTTGAGA 
GCGTCAAGAG 
TGCAGAAGAA 
GGTTTCCTTC 
CATTTTAGGG 
CCCAGTCCAT 
CTGTACACAT 
CCGTCACATC 
AGACAGTTGA 
GCTCCTTCCT 
ATCAGACAGG 
CTGTAACCTT 
049] 



AGGAGGTGGA 
GGGCGACAGA 
TCTGGCCACA 
TGTTTTGTAA 
CTGATGAACA 
CTAATACAAG 
GACACAGAGG 
TTTCTGGAAA 
TTGACAGGCA 
GTAAAGGCAC 
CTTTGAGAGC 
TATTTGTTGT 
TTCCTATAGT 
GACACAGAGA 
CAACATTATT 
GTAAAGTGAA 
CCATACTAGA 
GACACTTGGC 
TATTGACTTC 
GACGCAAATG 
CTTTACCCTA 
TTTAAAAAGG 
ACTGGTAGGC- 
TTTGCAGTCT 
CATATGTATG 
AACTGCCAGG 
TACTCTTGGG 

TTTAAGGTCT 
GGCTCAATTA 

CTTTTAACTC 
TGAGAATTAC 
TGAACCATCT 
GTTTCACCCA 

CCATCTGGAA 
CCAATGATCA 
AAATCAGATA 
TTGCCAGTGC 
CTGTTTTGCA 
GATGTGATGT 
TGCTTATTCC 
CTTTTTTTGC 
TCTGTGGGCA 
CCGAGAACTA 
TAGGAATGGT 
ATATGTGACA 
TATAGCTGGA 



GGTTGTAGTG 
GTGAAACTCT 
GTCACAAATG 
GCCTGCTATC 
AGATGTCTTT 
AGTGTGGTTA 
GTGTAGTATT 
AGGTAGCATT 
AAGAGGTGGA 
TAATTCATGA 
CTGGTTCAT.T 
TGTCCTTTCT 
TCCTCATGGT 
TGAAGGATCA 
TCTCTATTTA 
TCTTGCATGA 
ACATAGTGGA 
CCATCGACCC 
TCTCTCCTCT 
CATTCTGCCC 
CAACCTACTT 
GACCAAAAAA 
AGGCATTAAG 
GCCTGGGAGG 
GAAAGGAGAC 
GGGAGCCAGG 
AAGTCCTGTA 
A 

GGTTCTGGTC 
AGGAAAAGGG 
G 

TCATGAAATG 
TTTATTGATG 
ATCTCCAGAA 
GAGACTCAGC 

CTTCAAAGGG 
ATGCAGACCT 
CAAACTTCAA 
AGTCCAGCCA 
GTGGCTCCCA 
CAGGCTTGAT 
CTGCTGGCTG 
CAGCATCATT 
TTTTTGGAGT 
CAGAAAGCTG 
TTCAAATGGG 
AAAGTTGCTA 
GTTTATCAGG 



134800 

134900 

135000 

135100 

135200 

135300 . 

135400 

135500 

135600 

135700 

135800 

135900 

136000 

136100 

136200 

136300 
136400 

136500 
136600 
136700 
136800 
136900 
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TAAGCAGCM 
GAAATCTCAT 
CATTTATTGA 
CTAACCTGAC 
CCCAGTACAA 
AGATTATTTG 
GTGGGTAACA 
TTTTCTGGAG 
TCTTAGTTCA 
TCAACCTCCT 
AATTTTATTT 
AGACAGGGTT 
AGCAATCCTC 
CCACCACACC 
TGCTGATCTA 
TTGGGGAGGG 
GGAGAAAGGA 
TCTGTCATTA 
AGGATTAAAC 
AAGAGTAGCT 
AGATGGTGTG 
TCTACTTACT 
TCTCTATGAC 
CCAAATGCTG 
CTTTTTGTTA 
AAACTTATTA 
ATTAATATGG 
GGTTTCTTAG 
GATTCTACTT 
ACTTGTACAC 
TTTTACAGTA 
GAATAATTTT 
TTCTTTTAAA 
GAGGTATGAA 
CT^AAGTTCCT 
AGTTGCTTAG 
TGATATATGA 
GATATGTCTC 
TTTTCCATTT 
[exon 
GGCTTCTTTC 
GAGTGGATTT 
AGTATCTTCT 
ATCACTACCT 
TGATCTTATC 
AACTCAATGA 
CTTGTTGTTG 



TGAAACAATT 
ATCTCTAAGG 
GAGTTCAGGA 
TCCACAGGTA 
CTCCTTAATT 
TGGCTGGTTA 
GAATGGAGTT 
ACAGGGTCTC 
CTGCAACGTC 
GAGTAGCTAG 
TATTTTATTT 
TTGCCACGTT 
CCGTCTTGGC 
CAGCCTCAAA 
TTGAGCAACT 
AGGGAAAAAG 
TAAGAAATCA 
TCATCATGCC 
CTTCTCTAAT 
CACTTTAGTT 
AAGATGAGCT 
TATTTATTAT 
CAAACTCCTG 
GGATTACAGG 
GTATGTCCCA 
GACAGAGGAA 
GAGACAGTGT 
TATGAGGTTA 
TGATGTAAAT 
TTGCTCTTAG 
GTGATTTGTA 
TTCCAATTTG 
AGAGGTCTGT 
CCTTATTCAA 
GTCTCACCTG 
CCCTTTAAAC 
TTGTGTTTAT. 
CATGCAAG^iA 
ATATTTAACA 
2: 138992. 
AAGTGGGTTT 
GTCACTGGTG 
TGGGCTCAAC 
GGATACATGT 
ACCAGCCTTT 
ACACTTCAAA 
TTGTAGCAGC 



GGTTATTTCT 
GArCTGAGGA 
TATATGAAGG 
ATATAAGGCT 
TTACATGTCA 
TTGGCAGAGT 
GAGAGTGCAG 
ACTCTGTCAC 
CGCCTCCCTG 
GACTACAGGC 
TATTTTATTT 
GCCCAGGCTG 
CTCCCAAAGT 
TTCTAAATGT 
CTTACTAAAG 
TTGGGGACCA 
AATTCTTGAG' 
CCTGGCTTTT 
GCAGGCATTT 
GGTGCTCAGA 
GTCTACTCAT 
TTATTCATTT 
GGCTCAAGTG 
CATGAGCCAT 
CCAAGAAGGA 
AATATAAAGA 
GGCATAAGTA 
AAGATAAGTC 
CTAATTTTTT 
CCAAGAGGCT 
ATTTAAGGAA 
AAGTTTTCTT 
TCTTTGTGAT 
GTTTAAGAAA 
GGTTAATAAG 
ATAATTTTCA 
TCTAGCTCTG 
ATGTCAGGAT 
CTTCTATATC 



AGAAAAGTAA 
ATCACAATAA 
GTAGAGGCAA 
GGTTCACTGG 
GAAAATCTTG 
CAGCATTAGC 
GAGTTTCTCA 
GCTGGAGTGC 
GCTCAAGCAG 
ACATGCTAGC 
TTTATTTTTA 
GTTTCAAACT 
GCTGGGATTA 
CTCTTACCTT 
GTAGTGGTTG 
CAGTTTCATA 
TCTCCCATAG 
GGCATCCAGG 
CAAACCAACA 
TGAGTGGGGA 
ATATAATGGT 
ATTTATAAAG 
ATCCTCCTAA 
CACGCCCAAC 
AGAAGGCATA 
AGTAAAAATG 
CATATATACT 
TACAATAATT 
GTTTTACCAA 
GAGAAGCCGT 
AATACTTGGT 
GTGGATCCTT 
GGGAAGAATG 
CGTATGAAAA 
TAACAGTGTG 
TCTTTGTAAA 
ACATTCTGTG 
AATATAAAAT 
CTTCCTTCCA 



TCTAGTACAT- 
TTAAAGGTAT 
AATTCAAACC 
ACCTCCACCA 
GCTTTGCTTG 
AGTTAGGCAA 
CTTTTTTTTT 
AGTGGCACTA 
TCCTCCTACC 
ACACCTGGCT 
TTTTTTGTAG 
CCTGAGCTCA 
TAGCCATGAG 
CCATTAAAAT 
TCTTGGATTG 
TTATCAGCCA 
AATCCACTAA 
AGTCAGTGCC 
AGGGAAGGGG 
GGGAGAGTGA 
AAATAATAAG 
AGACAGGGTC 
TATTGCCTCC 
CAACTTTTGC 
ACAATTCTGA 
CAGAATTTTT 
GCATGAGAAT 
TTTAAAGTGT 
TTAAAACTTC 
AAGACTTCAC 
TTCTTAACTA 
GAGAATGTTT 
AAAAAAAAAA 
GAAAGAAATC 
ACCTTGGGCA 
ATGAGAAGAT 
ATGCTCTGAT 
TTAGAAGTTC 
GGTAGCGATG 



TGAAAATTAT AATTCTAGTA 
CACCCAAAGT ACCAGAATGG 
ATAGCTATTT CCATCATTGG 
GTTTGCCAAG AAACATGGTT 



TGTTTCTGTC 
CCTCCTTCAC 
CTTCCTCGGA 
CTTCAGAAAC 
TGTGCCTTTT 
TCCAAGCTTA 
CACATTAGCC 

TTGCTGGACA 
AACCTAATTC 
TTTTGCTATC 
ACACAGTCAA 



TACCTCTCAG 
TATTCXTACA 
CTAATGGTGT 
ATCCATAAGA 
GGTTCTTTTG 
AGGCACCGAT 
TCTCATTTTG 
A 

TATTCCCACT 
CTAGTGTGGC 
ACTGTATCAC 
AGCAAACCAG 



ATGCCTTGCT 
TCTCAGGCCA 
GGGCTCACTC 
CCAATCTCTG 
CCAACCAAAG 
TCCTATTGAA 
GAAAACTACA 

GGGTTTATGC 
TGTAGATGCA 
TTTCTGAGAT 
GAAATGTATG 



137100 

137200 

137300 

137400 

137500 

137600 

137700 

137800 

137900 

138000 

138100 

138200 

138300 

138400 

138500 

138600 

138700 

138800 

138900 

139000 

139100 
139200 
139300 

139400 
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CCATTGGCTT TTGTAATATC ATCCCTTCCT TCTTCCACTG TTTTACTACT 139600 
AGTGCAGCTC TTGCAAAGAC ATTGGTTAAA GAATCAACAG GCTGCCATAC 
TCAGCTTTCT GGTGTGGTAA CAGCCCTGGT TCTTTTGTTG GTCCTCCTAG 139700 
TAATAGCTCC TTTGTTCTAT TCCCTTCAAA AAAGTGTCCT TGGTGTGATC 
ACAATTGTAA ATCTACGGGG AGCCCTTCGT AAATTTAGGG ATCTTCCCAA 139800 
AATGTGGAGT ATTAGTAGAA TGGATACAGT TATCTGGTTT GTTACTATGC 
TGTCCTCTGC ACTGCTAAGT ACTGAAATAG GCCTACTTGT TGGGGTTTGT 139900 
TTTTCTATAT TTTGTGTCAT CCTCCGCACT CAGAAGCCAA AGAGTTCACT 
GCTTGGCTTG GTGGAAGAGT CTGAGGTCTT TGAATCTGTG TCTGCTTACA 140000 
AGAACCTTCA GACTAAGCCA GGCATCAAGA TTTTCCGCTT TGTAGCCCCT 
T 

CTCTACTACA TAAACAAAGA ATGCTTTAAA TCTGCTTTAT ACAAACAAAC 140100 
TGTCAACCCA ATCTTAATAA AGGTGGCTTG GAAGAAGGCA GCAAAGAGAA 
AGATCAAAGA AAAAGTAGTG ACTCTTGGTG GAATCCAGGA TGAAATGTCA 140200 
GTGCAACTTT CCCATGATCC CTTGGAGCTG CATACTATAG TGATTGACTG 
CAGTGCAATT CAATTTTTAG ATACAGCAGG GATCCACACA' CTGAAAGAAG 140300 
TTCGCAGAGA TTATGAAGCC ATTGGAATCC AGGTTCTGCT GGCTCAGTGC 
AATCCCACTG TGAGGGATTC CCTAACCAAC GGAGAATATT GCAAAAAGGA 140400 
T 

AGAAGAAAAC CTTCTCTTCT ATAGTGTGTA TGAAGCGATG GCTTTTGCAG 
AAGTATCTAA AAATCAGAAA GGAGTATGTG TTCCCAATGG TCTGAGTCTT 140500 
AGTAGTGATT AATTGAGAAG GTAGATAGAA GAATGTCTAG CCAATAGGTT 
..140512] 

AAAATTTCAA GTGTCCAACA TTTCCCAGTT CCACAGTGGG AAATTTTGCA 140600 

CACTTGAAAT TTTAACCAAG TGGCTAGATA TTATTCCTCC TTTGAAGCTA 

ATGGCATTTG TATATACACA CTGCAGCAGA GCTTGTAGCT GGACAGAGTC 140700 

AAAAAGAAGA AAATACGGTT TCAGGCTTTC TTGCAGATAT GAAGTATTCT 

TGGAATGCAA TAAGTATGTA TTGAACTGTA GTGTAAAGTA GCTCCAAAAC 140800 

TTAATTACTC TCCTGTTTTA GGGGTTATAC ATTTGGACTG TGCATTCTCC 

AAGAGATGAA GCGGTGAAGT TGGGATTTAC ATTGGAAGTG CTGTAGACTT 140900 

CTTTATGTGG CTCAGTGGAG AGAGGGAAAG AATGTTGCAC CTGCTCTAGT 

ACCATAGGTC AAGAGGCTTC TGGATCACAA AGTCATAACT AGACAGGTTT 141000 

GTTCTTGTAG TTTTCTATCC CCAGTCTTTG CTCCCCAGAT GGCAGTAGTT 

TTTAGTAGGA AAGTGCCATT CCTGTCCTTA AGGCACAGTC TCATCAGAAG 141100 

TCTAATACCT GGGCAGGTTT ATAACATCCT GAGAGCCAGC CTGACATTAG 

ACAGAATACC CTTTGTAATA CATTGGAAAT TTTTACTCAT GCCTTTTTGT 141200 

TTAGGATAAA TAGGTAAGCA CAAAGAGCTC TTCAAAATCA GAAAAAACAA 

TAGGAGTCCT TCCTTGTCTT TTCTGTGATC TCTGTCCTTG TTTCTGAGAC 141300 

TTTCTCTACC ATTAAGCTCT ATTTTAGCTT TCAGTTATTC TAGTTTGTTT 

CCCATGGAAT CTGTCCTAAA CTGGTGTTTT TGTCAGTGAC AGTCTTGCCA 141400 

GTCAGCAATT TCTAACAGCA TTTTAAATGA GTTTGATGTA CAGTAAATAT 

TGATGACAAT GACAGCTTTT A-ACTCTTCAA GTCACCTAAA GCTATTATGC 141500 

AGGAGGATTT AGAAGTCACA TTCATAAAAC CCAAGGGCTA TGGGTGTATT 

ATTCATGATA GCTGGCCCAC AGGTCATGAA TTGAGGAGGA ATTTGCTTTC 141600 

AAAAAGCAAG AATGTCCAAC ACTGAAAGTT TATAGTTTTA TATTTGGACC 

TTGAAAGGTA AGAAAAAACC AGGTTCTCCA AAGTTAGGAA TAGGGAACTA 141700 

ATTTATGAAA CAGCCATCTT AAAAAAAAAA AAAGTAAACT GCAAAAGTAC 

AAAATCATTT TTCAATCTGT TCCCAGTTTC TAAACAATTT TAAATATTTA 141800 

TGAGAAGCAA ACCCTATGTG TAGGGCATCT GTTGGAGTGG GATGCTTTTA 

GACATATATT .AAGTATGTAC ATGTTTAATA TGTATATTTA AAATGCATAT 141900 

ATATTTTATT ATATCTATAT TATCCTATAT AGATATATGT AACTTAGCTT 
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TATTGTTAGC 
TCCCATTTGG 
TGCTCAGTAG 
AGGTGATTAT 
ACTTGGGTCT 
GTTTGTTTTT 
AGCAGTAAAA 
TTTTTTAAGA 
CTGAAGCTTC 
GAGTAGCTAG 
TTTTTTTCTA 
TCCAGGCTCA 
CAGGCGTGAG 
GCCCTTCTCC 
ACCTGTGACA. 
ACCAACACAG 
CAGAAGGAGG 
TGTGGATGTA 
GAATTATGTT 
GCTTTTGGTA 
TTTCATCCTC 
AATGTTTTGG 
AATTTGCCGA 
GTAATCCCAG 
GAGTTCGAGA 
ATACAAAAAA 
TCAGGAGGCT 
AGTAAGCAGA 
CT.CCGTCTCA 
CAAAGGATAG 
TACAAAAATA 
TCTCAGTACA 
AGTGTGGTAT 
AAGTTTGTTA 
AGGAAAACAT 
ATCTGACCAG 
ATCCCTGGTT 
TACCTCCCAG 
GGGAGGGTCA 
TCATGGTAGC 
ACTGAGTGAC 
AGATGTCTAA 
AGTCTGACTT 
AAATTTACAT 
ATGAGTTCAT 
ACAGCAGTTC 
GTGGTCATTT 
GAGAAAAGGG 
TTATAATGAT 
TCTCTACGTA 
ACATGACTTC 
GCCAAGTGCT 



TCCATAAGCT 
TGACATGGAA 
AAAGTCTAGA 
AATCAAGTGT 
GGAATTCCAG 
TAGCCAGTAT 
TCACAGTATC 
GACAGTCATC 
CCACTCCTGG 
ACCATAGGTA 
GAGAGAGGGT 
AGCAATCCTT 
CCACTGCACT 
CAAGGCAGGC 
GAGTAATGAG 
CAAAAAATAT 
GATAACAGGA 
AACAGTTCTG 
CTGGTTTGTA 
GCCATCTGTA 
TCTACTTGGG 
TTGCTGATAT 
TAAAAAAACT 
CACGTTGGGA 
CCATCCTGGC 
TTAGCGGGGC 
GAGGCAGGAG 
GATTGTGCCA 
AACAAACAAA 
GACTTTAGTT 
TCTTGCTTAC 
AGATTCTGAG 
ACTCTTGATG 
CAGCCAAAGG 
TATAGAGGAG 
CGGAGGCAGA 
AGCCTCTACA 
AGCAAGGTAT 
CAGGGTCACA 
CTGGGTTGAC 
TGTAGTACTA 
TTTTTTTTTA 
TGCTTTTGAA 
ATGGTAAATG 
TAGACTCTTT 
TCTGCTGCAA 
AAGAACATGT 
CTACAGTGCA 
CCATTCGAGT 
TTGTTACAGA 
AGATACCTCT 
GG 



GCCAGTGTTG 
AATACCTTTC 
TTTCTGTCTT 
AGGCTTCCTG 
AAATGTTAAT 
TTGCCCTTTC 
TTGGTCAGTC 
CAGGCCAGAG 
GCTCAAGTTA 
TGCATCACCA 
CTCACTGTGT 
CAGCCTCAGC 
TGGCCAAGTT 
TTAAGXTGAG 
TACATGCTTA 
AATTCCAGCC 
TTTGACCTTT 
GAACGTTATG 
CTTGTCCCAT 
GAAACATTTA 
TTGAGGTTGC 
TCAGAGGAAT 
GTTTTCGGCC 
GGCCGAGGCG 
TAACACTGTG 
ATGGTGGCAC 
AATTACTTGA 
CTGGACTCCA 
CAAAAAACTG 
CTTTAAGCAT 
TCTAAACTTT 
TATCATAAAA 
' GTTAGAACTC 
GTTGGAGTGT 
TGAAGAGAAC 
AAAGAGAGGA 
CAATAATAGG 
CTTTCTAGAG 
GATTCACCAA 
CTACTCTGGA 
TCTGTGCCTC 
AGTAGGACCA 
CAACAGACAT 
ATGAACTTTA 
TAATGCTAAT 
TATTCCCATT 
TAGGGTTAGC 
TTTCTTGGTA 
TCTGTGATCC 
TGAGCCATAC 
GAGGACCTAC 



CTTTTCTGTT 
CATTATCACA 
ATAGGTGATT 
AATTTTGACA 
TGCTGCTTGT 
TATCCAGCCT 
TTTATTTTTT 
TGCAGTTTGA 
TCCTTCCATT 
CACCCTGCTA 
T.GCCCAGGCT 
CTCCCAGAGT 
ATTTATTTTT 
ACTATTATAG 
AGATGTTATA 
AAAGATTCTG 
ACCAGCGATT 
CATGCAGTTA 
CCATCCAAAC 
AGATGTCACT 
CTATACTTGC 
GAAACCTGGA 
AGGTGCAGTG 
GGTGGATCAC 
AAACCCCGTC 
GCGCCTGTAG 
ACCCGGGAGG 
GCCTGGGTGA 
TTTTCATTTG 
TATTTTAAAC 
AGAGTCTAAA 
TGGTTATTTA 
TTACAGCCTT 
GCCAGTGCAC 
AGACCATTGA 
ACCCAGTTGA 
GAGACAAGGA 
CAAATTTCTC 
AGCTGAAAGG 
GCACGGTGTC 
TGATGGTAAT 
AAGGAAAACA 
TGCAAGTCAA 
AAAATGTGTC 
GGCTAGTACG 
GACCACTTAA 
CCTGATCTGA 
ACTTAAACTG 
TTATTGTTCT 
GTTTCTTTGT 
CCAGCAGTCT 



GGTAGAGCTC 
ACAAAGCAGT 
TCTGTCTTAT 
TCCTTTTAGA 
ATTTGTTCTT 
TATGAATAAT ' 
TCCTTTTTTC 
TGATAGCTTA 
TTGGCCTCCT 
ATTTTTTAAA 
GGTCTCAAAC 
GTTGGGATTA 
AATCTCTCTT 
GTGTCTAATA 
ATTAGCCAAC 
GAAAATCC'CT ' 
TCTGTCCATA 
GCGAATCCTT 
AAGAGATTCT 
AGAATTTACA 
ATATTGTTAA 
ACCAAAGCCT 
GCTCATGCCT 
CTGAAGTCAG 
TCTACTAAAA 
TCCCAGCTAC 
CGGAGGTTGC 
CAGAGCGAGA 
CTCTCTTGAC 
ACTATATTGA 
TGAAGCTTTT 
ATTGAAACGT 
ATTTATTTTT 
AGGTAGACTA 
AAAGACTATT 
ATAGGATCCA 
TTAGGAGCCA 
TTTCTAGAAQ 
GCTGAGGAGC 
TTCCTTCTAA 
AAAACTGACA 
AGATTTAGAT 
AATTGTTGTC 
CAGGTGTTAG 
TTTAAACAAA 
ATGACCATAA 
ATATAAAAGT 
AGTCTTGAAG 
TAATTGTGTT 
ATCAATGTAG 
AGGACCCTGG 



142000 

142100 

142200 

142300 

142400 

142500 

142600 

142700 

142800 

142900 

143000 

143100 

143200 

143300 

1434.00. 

143500 

143600 

' 143700 

143800 

143900 

144000 

144100 

144200 

144300 

144400 

144500 
144512 
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POLYMORPHISMS IN THE CODING SEQUENCE OF SLC26A2 

ATGTCTTCAG AAAGTAAAGA GCAACATAAC GTTTCACCCA GAGACTCAGC 
TGAAGGAAAT GACAGTTATC CATCTGGGAT CCATCTGGAA CTTCAAAGGG 100 
AATCAAGTAC TGACTTCAAG CAATTTGAGA CCAATGATCA ATGCAGACCT 
TATCATAGGA TCCTTATTGA GCGTCAAGAG AAATCAGATA CAAACTTCAA 200 
GGAGTTTGTT ATTAAAAAGC TGCAGAAGAA TTGCCAGTGC AGTCCAGCCA 
AAGCCAAAAA TATGATTTTA GGTTTCCTTC CTGTTTTGCA GTGGCTCCCA 300 
AAATACGACC TAAAGAAAAA CATTTTAGGG GATGTGATGT CAGGCTTGAT 
TGTGGGCATA TTATTGGTGC CCCAGTCCAT TGCTTATTCC CTGCTGGCTG 400 
GCCAAGAACC TGTCTATGGT CTGTACACAT CTTTTTTTGC CAGCATCATT 
TATTTTCTCT TGGGTACCTC CCGTCACATC TCTGTGGGCA TTTTTGGAGT "500 
ACTGTGCCTT ' ATGATTGGTG AGACAGTTGA CCGAGAACTA CAGAAAGCTG 
GCTATGACAA TGCCCATAGT GCTCCTTCCT TAGGAATGGT TTCAAATGGG 600 
AGCACATTAT TAAATCATAC ATCAGACAGG ATATGTGACA AAAGTTGCTA 
TGCAATTATG GTTGGCAGCA CTGTAACCTT TATAGCTGGA GTTTATCAGG '700 
TAGCGATGGG CTTCTTTCAA GTGGGTTTTG TTTCTGTCTA CCTCTCAGAT . 
GCCTTGCTGA GTGGATTTGT CACTGGTGCC TCCTTCACTA TTCTTACATC 800 
TCAGGCCAAG TATCTTCTTG GGCTCAACCT TCCTCGGACT AATGGTGTGG 
GCTCACTCAT CACTACCTGG ATACATGTCT TCAGAAACAT CCATAAGACC 900 
AATCTCTGTG ATCTTATCAC CAGCCTTTTG TGCCTTTTGG TTCTTTTGCC 
AACCAAAGAA CTCAATGAAC ACTTCAAATC CAAGCTTAAG GCACCGATTC 1000 
CTATTGAACT TGTTGTTGTT GTAGCAGCCA CATTAGCCTC TCATTTTGGA 

A 

'AAACTACATG AAAATTATAA TTCTAGTATT GCTGGACATA TTCCCACTGG 1100 
GTTTATGCCA CCCAAAGTAC CAGAATGGAA CCTAATTCCT AGTGTGGCTG 
TAGATGCAAT AGCTATTTCC ATCATTGGTT TTGCTATCAC TGTATCACTT 1200 
TCTGAGATGT TTGCCAAGAA ACATGGTTAC ACAGTCAAAG CAAACCAGGA 
AATGTATGCC ATTGGCTTTT GTAATATCAT CCCTTCCTTC TTCCACTGTT 1300 
TTACTACTAG TGCAGCTCTT GCAAAGACAT TGGTTAAAGA ATCAACAGGC 
TGCCATACTC AGCTTTCTGG TGTGGTAACA GCCCTGGTTC TTT.TGTTGG.T 1400 
CCTCCTAGTA ATAGCTCCTT TGTTCTATTC CCTTCAAAAA AGTGTCCTTG 
GTGTGATCAC AATTGTAAAT CTACGGGGAG CCCTTCGTAA ATTTAGGGAT 1500 
CTTCCCAAAA TGTGGAGTAT TAGTAGAATG GATACAGTTA TCTGGTTTGT 
TACTATGCTG TCCTCTGCAC TGCTAAGTAC TGAAATAGGC CTACTTGTTG 1600 
GGGTTTGTTT TTCTATATTT TGTGTCATCC TCCGCACTCA GAAGCCAAAG 
AGTTCACTGC TTGGCTTGGT GGAAGAGTCT GAGGTCTTTG AATCTGTGTC 1700 
TGCTTACAAG AACCTTCAGA CTAAGCCAGG CATCAAGATT TTCCGCTTTG 

T 

TAGCCCCTCT CTACTACATA AACAAAGAAT GCTTTAAATC TGCTTTATAC 1800 
AAACAAACTG TCAACCCAAT CTTAATAAAG GTGGCTTGGA AGAAGGCAGC 
AAAGAGAAAG ATCAAAGAAA AAGTAGTGAC TCTTGGTGGA ATCCAGGATG 1900 
AAATGTCAGT GCAACTTTCC CATGATCCCT TGGAGCTGCA TACTATAGTG 
ATTGACTGCA GTGCAATTCA ATTTTTAGAT ACAGCAGGGA TCCACACACT 2000 
GAAAGAAGTT CGCAGAGATT ATGAAGCCAT TGGAATCCAG GTTCTGCTGG 
CTCAGTGCAA TCCCACTGTG AGGGATTCCC TAACCAACGG AGAATATTGC 2100 
T 

. AAAAAGGAAG AAGAAAACCT TCtCTTCTAT AGTGTGTATG AAGCGATGGC 
TTTTGCAGAA GTATCTMAA ATCAGAAAGG AGTATGTGTT CCCAATGGTC 2200 
TGAGTCTTAG TAGTGATTAA 2220 
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ISOFORMS OF THE SLC26A2 PROTEIN 

MSSESKEQHN VSPRDSAEGN DSYPSGIHLE LQRESSTDFK QFETNDQCRP 
YHRILIERQE KSDTNFKEFV IKKLQKNCQC SPAKAKNMIL GFLPVLQWLP 100 
KYDLKKNILG DVMSGLIVGI LLVPQSIAYS LLAGQEPVYG LYTSFFASII 
YFLLGTSRHI SVGIFGVLCL MIGETVDREL QKAGYDNAHS APSLGMVSNG 200 
STLLNHTSDR ICDKSCYAIM VGSTVTFIAG VYQVAMGFFQ VGFVSVYLSD 
ALLSGFVTGA SFTILTSQAK YLLGLNLPRT NGVGSLITTW IHVFRNIHKT 300 
NLCDLITSLL CLLVLLPTKE LNEHFKSKLK APIPIELVW VAATLASHFG 

• Y 
KLHENYNSSI AGHIPTGFMP PKVPBWNLIP SVAVDAIAIS IIGFAITVSL 400 
SEMFAKKHGY TVKANQEMYA IGFCNIIPSF FHCFTTSAAL AKTLVKESTG 
CHTQLSGWT ALVLLLVLLV lAPLFYSLQK SVLGVITIYN LRGALRKFRD 500 
LPKMWSISRM DTVIWFVTML SSALLSTEIG LLVGVCFSIF CVILRTQKPK 
SSLLGLVEES EVFESVSAYK NLQTKPGIKI FRFVAPLYYI NKECFKSALY 600 

I ' 

KQTVNPILIK VAWKKAAKRK IKEKWTLGG IQDEMSVQLS HDPLELHTIV 
IDCSAIQFLD TAGIHTLKEV RRDYEAIGIQ VLLAQCNPTV RDSLTNGEYC 700 

S 

KKEEENLLFY SVYEAMAFAE VSBGSJQKGVCV PNGLSLSSD 739 
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SEQUENCE LISTING 

<110> Genaissance Pharmaceuticals, Inc. 
Kliem,' Stefanie S. 
Koshy, Beena 
Tanguay, Debra A. 

<120> HAPLOTYPES OF THE SLC26A2 GENE 

<130> SLC26A2 MWH0849-PCT 

<140> TBA 

<141> 2001-06-22 



<150> 60/213,284 
<151> 2000-06-22 ' 

<160> 25 

<170> Patentin Ver. 2.1 

<210> 1 
<211> 12212 
<212> DNA 
<213> Homo sapiens 

<400> 1 

gtgtttatct gagaaagttt ttat^tcttt ttcatatttt attttattta agacagagtc 60 
tctccctgtc acccaggcta aagtgcagtg gtgtgatcat ggcttattgc agcctcaacc 120 
tcttaggctc aagcagtctc cctacctcag cctcccaagt agctgggacc ataggcacac 180 
accaccatgc ctggcctctt atcttttaaa atattttttc taggtaaaga attctatgtt 240 
aaaagttata tttctggggt cagazgtggt ggctcacacc tgtaatccta gcactttggg 300 
aggccagggt gggaggatca cttgaggcca agagtttggg aaccagcctg gggagcttag 360 
tagcaagacc ttgtctctac aaaaaaaatt aaaaaatgag ccaggcttgg tcgtgcacac 420 
ctgtaatcct cgctactcag gaggctgagg tgggagaatc acttgagccc aggagttcga 480 
ggccatggtg agccataatt gcatgtcact gcacttcagc ctgggcaaca gggcaagacc 540 
ctatttctat aaaagatata tatttctttg atacaaagat accactccat catgttctgg 600 
cctccatgat ttccaataag aagtctgcta taattcttat ttttgtttct gtataagtaa 660 
tatatgtttt ttctttgtct tcaggatttt ctctttatct tttgttttca gcagcttgaa 720 
tatcatatgt gtaggtgagg gtttttgttt tctttttgtt tttatattta ctcttcttgg 780 
aattctctaa gattcttgga gatagattat atattatata tatatgtatt ttttttttga 840 
gacagagttt tgctttgtca ccatgctgga gtgcagtggc gcgatcttgg ctcactgcag 900 
cctccgcctc ccaggttcaa gcaattctcc tgcctcagcc tcctgagtag ctgggactac 960 
aggagcgcac caccatgccc agctaatttt tgtatttttt tagaagagac ggggtttcac 1020 
catgttggcc aggatggtct tgatctcttg accttgtgat ctgcctgcct tggcctccgg 1080 
aagtgttgga attacaggtg tgagccactg cacccggcct tttttttttt tttttttttt 1140 
gagaaggagt tttcctcttg ctgtccagac aggagtgcaa tggcgtgatc tcggctcact 1200 
gcaacctcca gttcctgggt tccagcgatt ctcctgcctc ggcctcgcga gtagctggga 1260 
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ttacaggcgt gtgccaccat gcctgactaa ttttatattt ttagtagaga- ccctgtttca 1320 
ccatgttggt catgctggtc ttgaactcct *ggcctcaggt tacttggatc tatattttga 1380. 
tgtctcttat taattttgga aatttcttgg ccattatatc ttgaaatatt tcttctgccc 1440 

attctctctc ttcttttggt attctaatta tgcatatgtt tgatactttg atgtgtcccg 1500 
cagctcttgg atgttatgtt acattgttga tattggggtt tttttgtttt gtttcgttat 1560 
ttaaactttt ctgcgtttta gtttagctaa tttctgttta tctgtctttt tgagttcatc 1620 
gattctttct ttggctttcg ggtctagtaa tgatcctatc gaaggcattc atcatctctg 1680 
ttactgtgtt tttcactttt agcattcctt atttgatttt ttcttacagt ttccatttct 1740 
ttgttttact ttctatgtga atttttatct atattcttcc cagatagcag ccatactggg 1800 
caagatttgg attcccttag ttttttctta tgtctctttt gcatatctta ggtttggaaa 1860 
aaccttttca cccagaaatc accaagagct ctccaaaata ggacgttggt gcactcaagc 1920 
tagctctaga aagcaggaaa tgacaggaga gtcacagact ggttgcattg tagagcaaag 1980 
agctgctggg tgtatccatc aacctgattg attttctcag ggtcagtggt cacctaagtc 2040' 
tattctttct ttccttatct acttcagaag accatcaaaa tatttactca tgttaaacct 2100- • 
aaacaaaatg aatatgggtt acttttacat atcttgtgtt cttgaataag tcataggtcc 2160 
tggattttgg cagtttggta acgtaaggtt tatagaactt cacaagggcc catttttctg 2220 
tcagttccaa aacaagaaac ttcctgagtc cttcatctaa aatatattgt ctcggctggg 2280 
tgcagtggct cgtgcctgtt atcccagcac tttgggaggc tgaggcggat ggatcacctg 2340 
aggttaggac ttcgagacca gcctgaccaa tatggtgaaa ccctgtctct actaaaaatg 2400 
caaaaattag ccaggcgtgg tggcatgcac ctgtaatccc agctactcag gaggctgaga 24 60 
taagagaatt gcttgaaccc aggaggtgga ggttgtagtg agctgagatt gcaccactgc 2520 
actccagcct gggcgacaga gtgaaactct gtctcaaaaa aaaaaaaaaa aatgtctcta 2580 
tctggccaca gtcacaaatg tttgttcatt tgttcattca ttcattcaaa tgttttgtaa 2640 
gcctgctatc tcagcgttac tacattccat tcagattaca ctgatgaaca agatgtcttt 2700 
cctccaggag ctagagagat tcctacttca ctaatacaag agtgtggtta gtactctaat 2760 
ggaggtgcaa catgctatgg gacacagagg gtgtagtatt tcatttgggc taggggagat 2820 
tggttagtgc tttctggaaa aggtagcatt gtaactgggt tttaaaaaat tattaggatc 2880 
ttgacaggca aagaggtgga tggccattcg aagctaagta aacagcttait gtaaaggcac 2940 
taattcatga agcatttggt aaacaattta tgttctattc ctttgagagc ctggttcatt 3000 
ttcttctctt actccggtta ttagacttac tatttgttgt tgtcctttct ctttttctgg 3060 
ctatttttac ctcctttgtt ttcctatagt tcctcatggt agatcttatg gcattagttt 3120 
tatagtctag gacacagaga tgaaggatca cctgtattgc ctccaagtgg aagtgcaggg 3180 
caacattatt tctctattta acctgtgttt cagtgtgtgt acttagaata gtaaagtgaa 3240 
tcttgcatga atgtaggccc tgcccacagg gcagatgact ccatactaga acatagtgga 3300 
atagacaaaa accttctaca gcatgtatga gacacttggc ccatcgaccc tcttcatgcc 3360 
ctttacattc agcaccctca tattgacttc tctctcctct ttcctaccaa gcaagggagt 3420 
actgttcaaa gacgcaaatg cattctgccc tagtttcttt ttattgctaa aaacatttat 3480 
ctttacccta caacctactt ttctatttat tttcaacatt tagcaggttg tttaaaaagg 3540 
gaccaaaaaa taaaacagga ccatcttcct tgtttcaggg actggtaggc aggcattaag 3600 
gttaaggtag gggttaagac cagatcctat tttgcagtct gcctgggagg tgaaaaacct 3660 
gggaagaaga ccgctggtag catatgtatg gaaaggagac aggctgccct tacatctttt 3720 
caggaggaaa aactgccagg gggagccagg catatatgga gaagaatcct taatggttta 3780 
tactcttggg aagtcctrta cccagccagt tatttgcttt gacttggctg tttaaggtct 3840 
ggttctggtc tttttttttc cccctaacca agacaaatga ggctcaatta aggaraaggg 3900 
acataagata cctattccaa aactgaattc cttttaactc tcatgaaatg acaaatagaa 3960 
ttgttagtat atgtgagcac tgagaattac tttattgatg aacactggta ttttctctgg 4020 
tgtaggaagc tgaaccatct atctccagaa atgtcttcag aaagtaaaga gcaacataac 4080 
gtttcaccca gagactcagc tgaaggaaat gacagttatc catctgggat ccatctggaa 4140 
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cttcaaaggg aatcaagtac tgacttcaag caatttgaga ccaatgatca atgcagacct 4200 
tatcatagga tccttattga gcgtcaagag aaatcagata caaact.tcaa ggagtttgtt, 4260,^,., 
attaaaaagc tgcagaagaa ttgccagtgc agtccagcca. aagccaaaaa tatgatttta 4320 
ggtttccttc ctgttttgca gtggctccca aaatacgacc taaagaaaaa cattttaggg 4380 
gatgtgatgt caggcttgat tgtgggcata ttattggtgc cccagtccat tgcttattcc 4440 
ctgctggctg gccaagaacc tgtctatggt ctgtacacat ctttttttgc cagcatcatt 4500 
tattttctct tgggtacctc ccgtcacatc tctgtgggca tttttggagt actgtgcctt 4560 
atgattggtg agacagttga ccgagaacta cagaaagctg gctatgacaa tgcccatagt 4620 
gctccttcct taggaatggt ttcaaatggg agcacattat taaatcatac atcagacagg 4680 
atatgtgaca aaagttgcta. tgcaattatg gttggcagca ctgtaacctt tatagctgga 4740 
gtttatcagg taagcagcaa tgaaacaatt ggttatttct agaaaagtaa tctagtacat 4800 
gaaatctcat atctctaagg gatctgagga atcacaataa ttaaaggtat catttattga 48 60 
gagttcagga tatatgaagg gtagaggcaa aattcaaacc ctaacctgac tccacaggta 4920 
atataaggct ggttcactgg acctccacca cccagtacaa ctccttaatt- ttacatgtca . 4980- . 
gaaaatcttg gctttgcttg agattatttg tggctggtta ttggcagagt cagcattagc 5040 
agttaggcaa gtgggtaaca gaatggagtt gagagtgcag gagtttctca cttttttttt 5100 
ttttctggag acagggtctc actctgtcac gctggagtgc agtggcacta tcttagttca 5160 
ctgcaacgtc cgcctccctg gctcaagcag tcctcctacc tcaacctcct gagtagctag 5220 
gactacaggc acatgctacc acacctggct aattttattt tattttattt tattttattt 5280 
tttattttta ttttttgtag agacagggtt ttgccacgtt gcccaggctg gtttcaaact 5340 
cctgagctca agcaatcctc ccgtcttggc ctcccaaagt gctgggatta tagccatgag 5400 
ccaccacacc cagcctcaaa ttctaaatgt ctcttacctt ccattaaaat tgctgatcta 5460 • 
ttgagcaact cttactaaag gtagtggttg tcttggattg ttggggaggg agggaaaaag 5520 
ttggggacca cagtttcata ttatcagcca ggagaaagga taagaaatca aattcttgag 5580 
tctcccatag aatccactaa tctgtcatta tcatcatgcc cctggctttt ggcatccagg 5640 
agtcagtgcc aggattaaac cttctctaat gcaggcattt caaaccaaca agggaagggg 5700 
aagagtagct cactttagtt ggtgctcaga tgagtgggga gggagagtga agatggtgtg 5760 
aagatgagct gtctactcat ata-aatggt aaataataag tctacttact tatttattat 5820 
ttattcattt atttataaag agacagggtc tctctatgac caaactcctg ggctcaagtg 5880 
atcctcctaa tattgcctcc ccaaatgctg ggattacagg catgagccat cacgcccaac 594 0 
caacttttgc ctttttgtta gtatgtccca ccaagaagga agaaggcata acaattctga 6000 
aaacttatta gacagaggaa aatataaaga agtaaaaatg cagaattttt attaatatgg 6060 
gagacagtgt ggcataagta catatatact gcatgagaat ggtttcttag tatgaggtta 6120 
aagataagtc tacaataatt tttaaagtgt gattctactt tgatgtaaat ctaatttttt 6180 
gttttaccaa ttaaaacttc acttgtacac ttgctcttag ccaagaggct gagaagccgt 6240 
aagacttcac ttttacagta gtgatttgta atttaaggaa aatacttggt ttcttaacta 6300 
gaataatttt ttccaatttg aagttttctt gtggatcctt gagaatgttt ttcttttaaa 6360 
agaggtctgt tctttgtgat gggaagaatg aaaaaaaaaa gaggtatgaa ccttattcaa 6420 
gtttaagaaa cgtatgaaaa gaaagaaatc caaagttcct gtctcacctg ggttaataag 6480 
taacagtgtg accttgggca agttgcttag ccctttaaac ataattttca tctttgtaaa 6540 
atgagaagat tgatatatga ttgtgtttat tctagctctg acattctgtg atgctctgat 6600 
gatatgtctc catgcaagaa atgtcaggat aatataaaat ttagaagttc ttttccattt 6660 
atatttaaca cttctatatc cttccttcca ggtagcgatg ggcttctttc aagtgggttt 6720 
tgtttctgtc tacctctcag atgccttgct gagtggattt gtcactggtg cctccttcac 6780 
tattcttaca tctcaggcca agtatcttct tgggctcaac cttcctcgga ctaatggtgt 6840 
gggctcactc atcactacct ggatacatgt cttcagaaac atccataaga ccaatctctg 6900 
tgatcttatc accagccttt tgtgcctttt ggttcttttg ccaaccaaag aactcaatga 6960 
acacttcaaa tccaagctt^ aggcaccgat tcctattgaa cttgttgttg ttgtagcagc 7020 
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cacattagcc tctcattwtg gaaaactaca tgaaaattat aattctagta ttgctggaca 7080 . 
tattcccact gggtttatgc cacccaaagt accagaatgg aacctaattc ctagtgtggc 7140'^. - 
tgtagatgca atagctattt ccatcattgg ttttgctatc actgtatcac tttctgagat 7200 

gtttgccaag aaacatggtt acacagtcaa agcaaaccag gaaatgtatg ccattggctt 7260 
ttgtaatatc atcccttcct tcttccactg ttttactact agtgcagctc ttgcaaagac 7320 
attggttaaa -gaatcaacag gctgccatac tcagctttct ggtgtggtaa cagccctggt 7380 
tcttttgttg gtcctcctag taatagctcc tttgttctat tcccttcaaa aaagtgtcct 7440 
tggtgtgatc acaattgtaa atctacgggg agcccttcgt aaatttaggg atcttcccaa 7500 
aatgtggagt attagtagaa tggatacagt tatctggttt gttactatgc tgtcctctgc 7560 
actgctaagt actgaaatag gcctacttgt tggggtttgt ttttctatat tttgtgtcat 7620 
cctccgcact cagaagccaa agagttcact gcttggcttg gtggaagagt ctgaggtctt 7680 
tgaatctgtg tctgcttaca agaaccttca gaytaagcca ggcatcaaga ttttccgctt 7740 
tgtagcccct ctctactaca taaacaaaga atgctttaaa tctgctttat acaaacaaac 7800 
tgtcaaccca atcttaataa aggtggcttg gaagaaggca gcaaagagaa' agatcaaaga 7860' 
aaaagtagtg actcttggtg gaatccagga tgaaatgtca gtgcaacttt cccatgatcc 7920 
cttggagctg catactatag tgattgactg cagtgcaatt caatttttag atacagcagg 7980 
gatccacaca ctgaaagaag ttcgcagaga ttatgaagcc attggaatcc aggttctgct 8040 
ggctcagtgc aatcccwctg tgagggattc cctaaccaac ggagaatatt gcaaaaagga 8100 
agaagaaaac cttctcttct atagtgtgta tgaagcgatg gcttttgcag aagtatctaa 8160 
aaatcagaaa ggagtatgtg ttcccaatgg tctgagtctt agtagtgatt aattgagaag 8220 
gtagatagaa gaatgtctag ccaataggtt aaaatttcaa gtgtccaaca tttcccagtt 8280 
ccacagtggg aaattttgca cacttgaaat tttaaccaag tggctagata ttattcctcc 8340 
tttgaagcta atggcatttg tatatacaca ctgcagcaga gcttgtagct ggacagagtc 8 400 
aaaaagaaga aaatacggtt tcaggctttc ttgcagatat gaagtattct tggaatgcaa 84 60 
taagtatgta ttgaactgta ctgtaaagta gctccaaaac ttaattactc tcctgtttta 8520 
ggggttatac atttggactg tgcattctcc aagagatgaa gcggtgaagt tgggatttac 8580 
attggaagtg ctgtagactt ctttatgtgg ctcagtggag agagggaaag aatgttgcac 8640 
ctgctctagt accataggtc aagaggcttc tggatcacaa agtcataact agacaggttt 8700 
gttcttgtag ttttctatcc ccag-ctttg ctccccagat ggcagtagtt tttagtagga 8760 
aagtgccatt cctgtcctta aggcacagtc tcatcagaag tctaatacct gggcaggttt 8820 
ataacatcct gagagccagc ctgacattag acagaatacc ctttgtaata cattggaaat 8880 
ttttactcat gcctttttgt ttaggataaa taggtaagca caaagagctc ttcaaaatca 894 0 
gaaaaaacaa taggagtcct tccttgtctt ttctgtgatc tctgtccttg tttctgagac 9000 
tttctctacc attaagctct attttagctt tcagttattc tagtttgttt cccatggaat 9060 
ctgtcctaaa ctggtgtttt tgtcagtgac agtcttgcca gtcagcaatt tctaacagca 9120 
ttttaaatga gtttgatgta cagtaaatat tgatgacaat gacagctttt aactcttcaa 9180 
gtcacctaaa gctattatgc aggaggattt agaagtcaca ttcataaaac ccaagggcta 9240 
tgggtgtatt attcatgata gctggcccac aggtcatgaa ttgaggagga atttgctttc 9300 
aaaaagcaag aatgtccaac actgaaagtt tatagtttta tatttggacc ttgaaaggta. 9360 
agaaaaaacc aggttctcca aagttaggaa tagggaacta atttatgaaa cagccatctt 9420 
aaaaaaaaaa aaagtaaact gcaaaagtac aaaatcattt ttcaatctgt tcccagtttc 9480 
taaacaattt taaatattta tgagaagcaa accctatgtg tagggcatct gttggagtgg 9540 
gatgctttta gacatatatt aagtatgtac atgtttaata tgtatattta aaatgcatat 9600 
atattttatt atatctatat tatcctatat agatatatgt aacttagctt tattgttagc 9660 
tccataagct gccagtgttg cttttctgtt ggtagagctc tcccatttgg tgacatggaa 9720 
aatacctttc cattatcaca acaaagcagt tgctcagtag aaagtctaga tttctgtctt 9780 
ataggtgatt tctgtcttat aggtgattat aatcaagtgt aggcttcctg aattttgaca 9840 
tccttttaga acttgggtct ggaattccag aaatgttaat tgctgcttgt atttgttctt 9900 
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gtttgttttt_tagccagtat* ttgccctttc tatccagcct tatgaataat agcagtaaaa 9960 - 
tcacagtatc ttggtcagtc tttatttttt tccttttttc ttttttaaga gacagtcatc 10020 
caggccagag tgcagtttga tgatagctta ctgaagcttc ccactcctgg gctcaagtta 10080 
tccttccatt ttggcctcct gagtagctag accataggta tgcatcacca caccctgcta 10140 

attttttaaa tttttttcta gagagagggt ctcactgtgt tgcccaggct ggtctcaaac 10200 
tccaggctca agcaatcctt cagcctcagc ctcccagagt gttgggatta caggcgtgag 10260 
ccactgcact tggccaagtt atttattttt aatctctctt gcccttctcc caaggcaggc 10320 
ttaagttgag actattatag gtgtctaata acctgtgaca gagtaatgag tacatgctta 10380 
agatgttata attagccaac accaacacag caaaaaatat aattccagcc aaagattctg 10440 
gaaaatccct cagaaggagg gataacagga tttgaccttt accagcgatt tctgtccata 10500 
tgtggatgta aacagttctg gaacgttatg catgcagtta gcgaatcctt gaattatgtt 10560 
ctggtttgta cttgtcccat ccatccaaac aagagattct gcttttggta gccatctgta 10620 
gaaacattta agatgtcact agaatttaca tttcatcctc tctacttggg ttgaggttgc 10680 
ctatacttgc atattgttaa aatgttttgg ttgctgatat tcagaggaat gaaacctgga 10740 
accaaagcct aatttgccga taaaaaaact gttttcggcc aggtgcagtg gctcatgcct 10800 
gtaatcccag cacgttggga ggccgaggcg ggtggatcac ctgaagtcag gagttcgaga .108 60 
ccatcctggc taacactgtg aaaccccgtc tctactaaaa atacaaaaaa ttagcggggc 10920 
atggtggcac gcgcctgtag tcccagctac tcaggaggct gaggcaggag aattacttga 10980 
acccgggagg cggaggttgc agtaagcaga gattgtgcca ctggactcca gcctgggtga 11040 
cagagcgaga ctccgtctca aacaaacaaa caaaaaactg ttttcatttg ctctcttgac 11100 
caaaggatag gactttagtt ctttaagcat tattttaaac actatattga tacaaaaata 11160 
tcttgcttac tctaaacttt agagtctaaa tgaagctttt tctcagtaca agattctgag 11220 
tatcataaaa tggttattta attgaaacgt agtgtggtat actcttgatg gttagaactc 11280 
ttacagcctt atttattttt aagtttgtta cagccaaagg gttggagtgt gccagtgcac 11340 
aggtagacta aggaaaacat tatagaggag tgaagagaac agaccattga aaagactatt 114 00 
atctgaccag cggaggcaga aaagagagga acccagttga ataggatcca atccctggtt 11460 
agcctctaca caataatagg gagacaagga ttaggagcca tacctcccag agcaaggtat 11520 
ctttctagag caaatttc-c tttctagaag gggagggtca cagggtcaca gattcaccaa 11580 
agctgaaagg gctgaggagc tcatggtagc ctgggttgac ctactctgga gcacggtgtc 11640 
ttccttctaa actgagtgac tgtagtacta tctgtgcctc tgatggtaat aaaactgaca 11700 
agatgtctaa ttttttttta agtaggacca aaggaaaaca agatttagat agtctgactt 117 60 
tgcttttgaa caacagacat tgcaagtcaa aattgttgtc aaatttacat atggtaaatg 11820 
atgaacttta aaaatgtgtc caggtgttag atgagttcat tagactcttt taatgctaat 11880 
ggctagtacg tttaaacaaa acagcagttc tctgctgcaa tattcccatt gaccacttaa 11940 
atgaccataa gtggtcattt aagaacatgt tagggttagc cctgatctga atataaaagt 12000 
gagaaaaggg ctacagtgca tttcttggta acttaaactg agtcttgaag ttataatgat 12060 
ccattcgagt tctgtgatcc ttattgttct taattgtgtt tctctacgta ttgttacaga 12120 
tgagccatac gtttctttgt atcaatgtag acatgacttc agatacctct gaggacctac 12180 
ccagcagtct aggaccctgg gccaagtgct gg 12212 

<210> 2 

<211> 2220 

<212> DNA 

<213> Homo sapiens 

<400> 2 

atgtcttcag aaagtaaaga gcaacataac gtttcaccca gagactcagc tgaaggaaat 60 
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gacagttatc catctgggat ccatctggaa 
caatttgaga ccaatgatca atgcagacct 
aaatcagata caaacttcaa ggagtttgtt 

agtccagcca aagccaaaaa tatgatttta 
aaatacgacc taaagaaaaa cattttaggg 
ttattggtgc cccagtccat tgcttattcc 
ctgtacacat ctttttttgc cagcatcatt 
tctgtgggca tttttggagt actgtgcctt 
cagaaagctg gctatgacaa tgcccatagt 
agcacattat taaatcatac atcagacagg 
gttggcagca ctgtaacctt tatagctgga 
gtgggttttg tttctgtcta cctctcagat 
tccttcacta ttcttacatc tcaggccaag 
aatggtgtgg gctcactcat cactacctgg 
aatctctgtg atcttatcac cagccttttg 
ctcaatgaac acttcaaatc caagcttaag 
gtagcagcca cattagcctc tcattttggai 
gctggacata ttcccactgg gtttatgcca 
agtgtggctg tagatgcaat agctatttcc 
tctgagatgt ttgccaagaa acatggttac 
attggctttt gtaatatcat cccttccttc 
gcaaagacat tggttaaaga atcaacaggc 
gccctggttc ttttgttcgt cctcctagta 
agtgtccttg gtgtgatcac aattgtaaat 
cttcccaaaa tgtggagtat tagtagaatg 
tcctctgcac tgctaagtac tgaaataggc 
tgtgtcatcc tccgcactca gaagccaaag 
gaggtctttg aatctgtgtc tgcttacaag 
ttccgctttg tagcccctct ctactacata 
aaacaaactg tcaacccaat cttaataaag 
atcaaagaaa aagtagtgac tcttggtgga 
catgatccct tggagctgca tactatagtg 
acagcaggga tccacacact gaaagaagtt 
gttctgctgg ctcagtgcaa tcccactgtg 
aaaaaggaag aagaaaacct tctcttctat 
gtatctaaaa atcagaaagg agtatgtgtt 



cttcaaaggg aatcaagtac tgacttcaag 120 
tatcatagga tccttattga gcgtcaagag 180 
attaaaaagc tgcagaagaa ttgccagtgc 240 

ggtttccttc ctgttttgca gtggctccca 300 
gatgtgatgt caggcttgat tgtgggcata 360 
ctgctggctg gccaagaacc tgtctatggt 420 
tattttctct tgggtacctc ccgtcacatc 480 
atgattggtg agacagttga ccgagaacta 540 
gctccttcct taggaatggt ttcaaatggg 600 
atatgtgaca aaagttgcta tgcaattatg 660 
gtttatcagg tagcgatggg cttctttcaa 720 
gccttgctga gtggatttgt cactggtgcc 780 
tatcttcttg ggctcaacct tcctcggact 840 
atacatgtct tcagaaacat - ccataagacc 900 
tgccttttgg ttcttttgcc aaccaaagaa 960 
gcaccgattc ctattgaact tgttgttgtt 1020 
aaactacatg aaaattataa ttctagtatt 1080 
cccaaagtac cagaatggaa cctaattcct 1140 
atcattggtt ttgctatcac tgtatcactt 1200 
acagtcaaag caaaccagga aatgtatgcc 1260 
ttccactgtt ttactactag tgcagctctt 1320 
tgccatactc agctttctgg tgtggtaaca 1380 
atagctcctt tgttctattc ccttcaaaaa 1440 
ctacggggag cccttcgtaa atttagggat 1500 
gatacagtta tctggtttgt tactatgctg 1560 
ctacttgttg gggtttgttt ttctatattt 1620 
agttcactgc ttggcttggt ggaagagtct 1680 
aaccttcaga ctaagccagg catcaagatt 1740 
aacaaagaat gctttaaatc tgctttatac 1800 
gtggcttgga agaaggcagc aaagagaaag 1860 
atccaggatg aaatgtcagt gcaactttcc 1920 
attgactgca gtgcaattca atttttagat 1980 
cgcagagatt atgaagccat tggaatccag 2040 
agggattccc taaccaacgg agaatattgc 2100 
agtgtgtatg aagcgatggc ttttgcagaa 2160 
cccaatggtc tgagtcttag tagtgattaa 2220 



<210> 3 
<211> 739 
<212> PRT 

<213> Homo sapiens 
<400> 3 

Met Ser Ser Glu Ser Lys Glu Gin His Asn Val Ser Pro Arg Asp Ser 
15 10 15 

Ala Glu Gly 'Asn Asp Ser Tyr Pro Ser Gly He His Leu Glu Leu Gin 
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20 25 „ „ - 30 

Arg Glu Ser Ser Thr Asp Phe Lys Gin Phe Glu Thr Asn Asp Gin Cys 
35 40 45 

Arg Pro Tyr His Arg He Leu He Glu Arg Gin Glu Lys Ser Asp Thr 
50 55 60 

Asn Phe Lys Glu Phe Val He Lys Lys Leu Gin Lys Asn Cys Gin Cys 
65 70 75 80 

Ser Pro Ala Lys Ala Lys Asn Met He Leu Gly Phe Leu Pro Val Leu 
85 90 * 95 

Gin Trp Leu Pro Lys Tyr Asp Leu Lys Lys Asn lie Leu Gly Asp Val 
100 ■ 105 . 110 

Met Ser Gly Leu He Val Gly He Leu Leu Val Pro Gin Ser He Ala 
115 120 125 ■ 

Tyr Ser Leu Leu Ala Gly Gin Glu Pro Val Tyr Gly Leu Tyr Thr Ser 
130 135 140 

Phe Phe Ala Ser He He Tyr Phe Leu Leu Gly Thr Ser Arg His He 
145 150 155 160 

Ser Val Gly He Phe Gly Val Leu Cys Leu Met He Gly Glu Thr Val 
165 170 175 

Asp Arg Glu Leu Gin Lys Ala Gly Tyr Asp Asn Ala His Ser Ala Pro 
180 185 190 

Ser Leu Gly Met Val Ser Asn Gly Ser Thr Leu "Leu Asn His Thr Ser 
195 200 205 

Asp Arg He Cys Asp Lys Ser Cys Tyr Ala He Met Val Gly Ser Thr 
210 215 220 

Val Thr Phe He Ala Gly Val Tyr Gin Val Ala Met Gly Phe Phe Gin 
225 230 235 240 

Val Gly Phe Val Ser Val Tyr Leu Ser Asp Ala Leu Leu Ser Gly Phe 
245 250 255 

Val Thr Gly Ala Ser Phe Thr He Leu Thr Ser Gin Ala Lys Tyr Leu 
260 265 270 

Leu Gly Leu Asn Leu Pro Arg Thr Asn Gly Val Gly Ser Leu He Thr 
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275 .280 285 

Thr Trp He His Val Phe Arg Asn He His Lys Thr Asn Leu Cys Asp 

290 295 300 

Leu He Thr Ser Leu Leu Cys Leu Leu Val Leu Leu Pro Thr Lys Glu 
305 310 315 320 

Leu Asn Glu His Phe Lys Ser Lys Leu Lys Ala Pro He Pro He Glu 
325 330 335 

Leu Val Val Val Val Ala Ala Thr Leu Ala Ser His Phe Gly Lys Leu 
340 345 350 



His Glu Asn Tyr Asn Ser Ser He Ala Gly His He Pro Thr Gly Phe 

.355 360 365 

Met Pro Pro Lys Val Pro Glu Trp Asn Leu He Pro Ser Val Ala Val 
370 375 380 



Asp Ala He Ala He Ser He He Gly Phe Ala He Thr Val Ser Leu 
385 390 395 400 

Ser Glu Met Phe Ala Lys Lys His Gly Tyr Thr Val Lys Ala Asn Gin 
405 410 415 



Glu Met Tyr Ala He Gly Phe Cys 
420 

Cys Phe Thr Thr Ser Ala Ala Leu 
435 440 

Thr Gly Cys His Thr Gin Leu Ser 
450 455 

Leu Leu Val Leu Leu Val He Ala 
465 470 

Ser Val Leu Gly Val He Thr He 
485 



Asn He He Pro Ser Phe Phe His 
425 430 

Ala Lys Thr Leu Val Lys Glu Ser 
4-45 

Gly Val Val Thr Ala Leu Val Leu 
460 

Pro Leu Phe Tyr Ser Leu Gin Lys 
475 480 

Val .Asn Leu Arg Gly Ala Leu Arg 
490 495 



Lys Phe Arg Asp Leu Pro Lys Met Trp Ser He Ser Arg Met Asp Thr 
500 505 510 

Val He Trp Phe Val Thr Met Leu Ser Ser Ala Leu Leu Ser Thr Glu 
515 520 525 

He Gly Leu Leu Val Gly Val Cys Phe Ser He Phe Cys Val He Leu 
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530 . . \. ^ 535. . . 540 

Arg Thr Gin Lys Pro Lys Ser Ser Leu Leu Gly Leu Val Glu Glu Ser 
545 550 555 560 

Glu Val Phe Glu Ser Val Ser Ala Tyr Lys Aan Leu Gin Thr Lys Pro* 
565 570 575 



Gly lie Lys lie Phe Arg Phe Val Ala Pro Leu Tyr Tyr lie Asn Lys 

580 585 590 

Glu Cys Phe Lys Ser Ala Leu Tyr Lys Gin Thr Val Asn Pro lie Leu 

595 600 605 

lie Lys Val Ala Trp Lys Lys Ala Ala Lys Arg Lys lie Lys Glu Lys 

.610 _ .. _ 615 _ . 620 

Val Val Thr Leu Gly Gly lie Gin Asp Glu Met Ser Val Gin Leu Ser 

625 630 635 640 



His Asp Pro Leu Glu Leu His Thr 
645 

Gin Phe Leu Asp Thr Ala Gly lie 
660 

Asp Tyr Glu Ala He Gly He Gin 
675 680 

Thr Val Arg Asp Ser Leu Thr Asn 
690 695 

Glu Asn Leu Leu Phe Tyr Ser Val 
705 710 



He Val He Asp Cys Ser Ala He 
650 655 

His Thr Leu Lys Glu Val Arg Arg 
665 670 

Val Leu Leu Ala Gin Cys Asn Pro 
685 

Gly Glu Tyr Cys Lys Lys Glu Glu 
700 

Tyr Glu Ala Met Ala Phe Ala Glu 
715 720 



Val Ser Lys Asn Gin Lys Gly Val Cys Val Pro Asn Gly Leu Ser Leu 
725 730 735 



Ser Ser Asp 



<210> 4 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 4 
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<210> 5 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 5 

ttaaggaraa gggac 
i 

<210> 6 

<211> 15 

<212>. DNA 

<213> Homo sapiens 

<400> 6 

tctcattwtg gaaaa 



<210> 7 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 7 

caatcccwct gtgag 



<210> 8 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 8 

cttgggaagt cctrt 

<210> 9 
<211> 15 
<212> DNA 

<213> Homo sapiens 
<400> 9 

aactggctgg gtaya 
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<210> 10 . ^ ' ^ - ^ . 

<211> IS - ' ... 

<212> DNA 

<213> Homo sapiens 

<400> 10 

gctcaattaa ggara 15 



<210> 11 

<211> 15 

<212> DNA 

<213> Homo sapiens 



<400> 11 

tcttatgtcc cttyt 15 



<210> 12 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<4C0> 12 

ttagcctctc attwt 15 



<210> 13 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 13 

atgtagtttt ccawa 15 



<210> 14 

<211> 15 

<212> DNA 

<213> Homo sapiens 

<400> 14 

tcagtgcaat cccwc 15 



<210> 15 
<211> 15 
<212> DNA 
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. <213> Homo sapiens 



<400> 15 



gaatccctca cagwg 



15 



<210> 16 
<211> 10 
<212> DNA 

<213> Homo sapiens 
<400> 16 

gggaagtcct 10 

<210> 17 

<211> 10 . 

<212> DNA 

<213> Homo sapiens 



<210> 18 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 18 

caattaagga 10 

<210> 19 

<211> 10 

<212> DNA 

<213> Homo sapiens 



<400> 17 



tggctgggta 



10 



<400> 19 



tatgtccctt 



10 



<210> 20 



<211> 10 



<212> DNA 

<213> Homo sapiens 



<400> 20 
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^gcctctcatt 

<210> 21 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 21 
tagttttcca 



<210> 22 

<211> 10 - - . 

<212> DNA 

<213> Homo sapiens 

<400> 22 
gtgcaatccc 

<210> 23 

<211> 10 

<212> DNA 

<213> Homo sapiens 

<400> 23 
tcccrcacag 

<21p> 24 

<211> 12212 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 

<222> (3798) 

<223> PSl: Polymorphic base G or 
<220> 

<221> allele 
<222> (3895) 

<223> PS2: Polymorphic base A or 
<220> 

<221> allele 
<222> (7038). 
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<223> PS3.: Polymorphic base T or A 

<220> 

<221> allele 
<222> (7713) 

<223> PS4: Polymorphic base C or T 
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<220> 

<221> allele 
<222> (8057) 

<223> PS5: Polymorphic base A or T 

<400> 24 ' 
gtgtttatct- gagaaagttt-'ttatttcttt'ttcatatttt attttattta agacagagtc 60 ■ 
tctccctgtc acccaggcta aagtgcagtg gtgtgatcat ggcttattgc agcctcaacc 120 
tcttaggctc aagcagtctc cctacctcag cctcccaagt. agctgggacc ataggcacac 180 
accaccatgc ctggcctctt atcttttaaa atattttttc taggtaaaga attctatgtt 240 • 
aaaagttata tttctggggt cagatgtggt ggctcacacc tgtaatccta gcactttggg 300 
aggccagggt gggaggatca cttgaggcca agagtttggg aaccagcctg gggagcttag 360 
tagcaagacc ttgtctctac aaaaaaaatt aaaaaatgag ccaggcttgg tcgtgcacac 420 
ctgtaatcct cgctactcag gaggczgagg tgggagaatc acttgagccc aggagttcga 480 
ggccatggtg agccataatt gcatgtcact gcacttcagc ctgggcaaca gggcaagacc 540 
ctatttctat aaaagata-a tatttctttg atacaaagat accactccat catgttctgg 600 
cctccatgat ttccaataag aagtctgcta taattcttat ttttgtttct gtataagtaa 660 
tatatgtttt ttctttgtct tcaggatttt ctctttatct tttgttttca gcagcttgaa 720 
tatcatatgt gtaggtgagg gtttttgttt tctttttgtt tttatattta ctcttcttgg 780 
aattctctaa gattcttgga gatagattat atattatata tatatgtatt ttttttttga 840 
gacagagttt tgctttgtca ccatgctgga gtgcagtggc gcgatcttgg ctcactgcag 900 
cctccgcctc ccaggttcaa gcaattctcc tgcctcagcc tcctgagtag ctgggactac 960 
aggagcgcac caccatgccc agctaatttt tgtatttttt tagaagagac ggggtttcac 1020 
catgttggcc aggatggtct tgarctcttg accttgtgat ctgcctgcct' tggcctccgg 1080 
aagtgttgga attacaggtg tgagccactg cacccggcct tttttttttt tttttttttt 1140 
gagaaggagt tttcctcttg ctgtccagac aggagtgcaa tggcgtgatc tcggctcact 1200 
gcaacctcca gttcctgggt tccagcgatt ctcctgcctc ggcctcgcga gtagctggga 1260 
ttacaggcgt gtgccaccat gcctgactaa ttttatattt ttagtagaga ccctgtttca 1320 
ccatgttggt catgctggtc ttgaactcct ggcctcaggt tacttggatc tatattttga 1380 
tgtctcttat taattttgga aatttcttgg ccattatatc ttgaa^tatt tcttctgccc 1440 
attctctctc ttcttttggt attctaatta tgcatatgtt tgatactttg atgtgtcccg 1500 
cagctcttgg atgttatgtt acatcgttga tattggggtt tttttgtttt gtttcgttat 1560 
ttaaactttt ctgcgtttta gtttagctaa tttctgttta tctgtctttt tgagttcatc 1620 
gattctttct ttggctttcg ggtctagtaa tgatcctatc gaaggcattc atcatctctg 1680 
ttactgtgtt tttcactttt agcattcctt atttgatttt ttcttacagt ttccatttct 1740 
ttgttttact ttctatgtga atttttatct atattcttcc cagatagcag ccatactggg 1800 
caagatttgg attcccttag ttttttctta tgtctctttt gcatatctta ggtttggaaa 1860 
aaccttttca cccagaaatc accaagagct ctccaaaata ggacgttggt gcactcaagc 1920 
tagctctaga aagcaggaaa tgacaggaga gtcacagact ggttgcattg tagagcaaag 1980 
agctgctggg tgtatccatc aacctgattg attttctcag ggtcagtggt cacctaagtc 2040 
tattctttct ttccttatct acttcagaag accatcaaaa tatttactca tgttaaacct 2100 
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aaacaaaatg aatatgggtt acttttacat. 
tggattttgg cagtttggtaacgtaaggtt 
tcagttccaa aacaagaaac ttcctgagtc 
tgcagtggct cgtgcctgtt atcccagcac 
aggttaggac ttcgagacca gcctgaccaa 
caaaaattag ccaggcgtgg tggcatgcac 
taagagaatt gcttgaaccc aggaggtgga 
actccagcct gggcgacaga gtgaaactct 
tctggccaca gtcacaaatg tttgrtcatt 
gcctgctatc tcagcgttac tacattccat 
cctccaggag ctagagagat tcctacttca 
ggaggtgcaa catgctatgg gacacagagg 
tggttagtgc tttctggaaa aggtagcatt 
ttgacaggca aagaggtgga tggccattcg- 
taattcatga agcatttggt aaacaattta 
ttcttctctt actccggtta ttagacttac 
ctatttttac ctcctttgtt ttcctatagt 
tatagtctag gacacagaga tgaaggatca 
caacattatt tctctattta acctgtgttt 
tcttgcatga atgtaggccc tgcccacagg 
atagacaaaa accttctaca gcatgtatga 
ctttacattc agcaccctca tattgacttc 
actgttcaaa gacgcaaatg cattctgccc 
ctttacccta caacctactt ttctatttat 
gaccaaaaaa taaaacagga ccatcttcct 
gttaaggtag gggttaagac cagatcctat 
gggaagaaga ccgctggtag catatgtatg 
caggaggaaa aactgccagg gggagccagg 
tactcttggg aagtcctrta cccagccagt 
ggttctggtc tttttttttc cccctaacca 
acataagata cctattccaa aactgaattc 
ttgttagtat atgtgagcac tgagaattac 
tgtaggaagc tgaaccatct atctccagaa 
gtttcaccca gagactcagc tgaaggaaat 
cttcaaaggg aatcaagtac tgacttcaag 
tatcatagga tccttattga gcgtcaagag 
attaaaaagc tgcagaagaa ttgccagtgc 
ggtttccttc ctgttttgca gtggctccca 
gatgtgatgt caggcttgat tgtgggcata 
ctgctggctg gccaagaacc tgtctatggt 
tattttctct tgggtacctc ccgtcacatc 
atgattggtg agacagttga ccgagaacta 
gctccttcct taggaatggt ttcaaatggg 
atatgtgaca aaagttgcta tgcaattatg 
gtttatcagg taagcagcaa tgaaacaatt 
gaaatctcat atctctaagg gatctgagga 
gagttcagga tatatgaagg gtagaggcaa 
atataaggct ggttcactgg acctccacca 



atcttgtgtt 


cttgaataag 


tcataggtcc 


2160 _ 


tatagaactt 


.cacaagggcc. 


catttttctg. 


2220 r-. 


cttcatctaa 


aatatattgt 


ctcggctggg 


2280 


tttgggaggc 


tgaggcggat 


ggatcacctg 


2340 


tatggtgaaa 


ccctgtctct 


actaaaaatg 


2400 


ctgtaatccc 


agctactcag 


gaggctgaga 


2460 


ggttgtagtg 


agctgagatt 


gcaccactgc 


2520 


gtctcaaaaa 


aaaaaaaaaa 


aatgtctcta 


2580 


tgttcattca 


ttcattcaaa 


tgttttgtaa 


2640 


tcagattaca 


ctgatgaaca 


agatgtcttt 


2700 


ctaatacaag 


agtgtggtta 


gtactctaat 


2760 


gtgtagtatt 


tcatttgggc 


taggggagat 


2820 


gtaactgggt 


tttaaaaaat" 


tattaggatc 


2880 


aagctaagta 


aacagcttat 


gtaaaggcac- 


2940- - 


tgttctattc 


ctttgagagc 


ctggttcatt 


3000 


tatttgttgt 


tgtcctttct 


ctttttctgg 


3060 


tcctcatggt 


agatcttatg 


gcattagttt 


3120 


cctgtattgc 


ctccaagtgg 


aagtgcaggg 


3180 


cagtgtgtgt 


acttagaata 


gtaaagtgaa 


3240 - 


gcagatgact 


ccatactaga 


acatagtgga 


3300 


gacacttggc 


ccatcgaccc 


tcttcatgcc 


3360 


tctctcctct 


ttcctaccaa 


gcaagggagt 


3420 


tagtttcttt 


ttattgctaa 


aaacatttat 


3480 


tttcaacatt 


tagcaggttg 


tttaaaaagg 


3540 


tgtttcaggg 


actggtaggc 


aggcattaag 


3600 


tttgcagtct 


gcctgggagg 


tgaaaaacct 


3660 


gaaaggagac 


aggctgccct 


tacatctttt 


3720 


catatatgga 


gaagaatcct 


taatggttta 


37-80 


tatttgcttt 


gacttggctg 


tttaaggtct 


3840 


agacaaatga 


ggctcaatta 


aggaraaggg 


3900 


cttttaactc 


tcatgaaatg 


acaaatagaa 


3960 


tttattgatg 


aacactggta 


ttttctctgg 


4020 


atgtcttcag 


aaagtaaaga 


gcaacataac 


4080 


gacagttatc 


catctgggat 


ccatctggaa 


4140 


caatttgaga 


ccaatgatca 


atgcagacct 


4200 


aaatcagata 


caaacttcaa 


ggagtttgtt 


4260 


agtccagcca 


aagccaaaaa 


tatgatttta 


4320 


aaatacgacc 


taaagaaaaa 


cattttaggg 


4380 


ttattggtgc 


cccagtccat 


tgcttattcc 


4440 


ctgtacacat 


ctttttttgc 


cagcatcatt 


4500 


tctgtgggca 


tttttggagt 


actgtgcctt 


4560 


cagaaagctg 


gctatgacaa 


tgcccatagt 


4620 


agcacattat 


taaatcatac 


atcagacagg 


4680 


gttggcagca 


ctgtaacctt 


tatagctgga 


4740 


ggttatttct 


agaaaagtaa 


tctagtacat 


4800 


atcacaataa 


ttaaaggtat 


catttattga 


4860 


aattcaaacc 


ctaacctgac 


tccacaggta 


4920 


cccagtacaa 


ctccttaatt 


ttacatgt'ca 


4980 
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gaaaatcttg gctttgcttg agattatttg^.tggctggtta' ttggcagagt cagcattagc 5040 

agttaggcaa gtgggtaaca gaatggagtt gagagtgcag gagtttctca...cttttttttt.5100 - 

ttttctggag acagggtctc actctgtcac gctggagtgc agtggcacta tcttagttca 5160 

ctgcaacgtc cgcctccctg gctcaagcag tcctcctacc tcaacctcct gagtagct'ag 5220 

gactacaggc acatgctacc acacctggct aattttattt tattttattt tattttattt 5280 

tttattttta ttttttgtag agacagggtt ttgccacgtt gcccaggctg gtttcaaact 5340 

cctgagctca agcaatcctc ccgtcttggc ctcccaaagt gctgggatta tagccatgag 5400 

ccaccacacc cagcctcaaa ttctaaatgt ctcttacctt ccattaaaat tgctgatcta 5460 

ttgagcaact cttactaaag gtagtggttg tcttggattg ttggggaggg agggaaaaag 5520 

ttggggacca cagtttcata ttatcagcca ggagaaagga taagaaatca aattcttgag 5580 
tctcccatag aatccactaa tctgtcatta tcatcatgcc -cctggctttt ggcatccagg 5640 

agtcagtgcc aggattaaac cttctctaat gcaggcattt caaaccaaca agggaagggg 5700 

aagagtagct cactttagtt ggtgctcaga' tgagtgggga gggagagtga agatggtgtg 5760 

aagatgagct gtctactcat- atataatggt aaataataag tctacttact tatttattat -5820 - 

-ttattcattt atttataaag agacagggtc tctctatgac caaactcctg ggctcaagtg 5880 

atcctcctaa tattgcctcc ccaaatgctg ggattacagg catgagccat cacgcccaac 5940 

caacttttgc ctttttgtta gtatgtccca ccaagaagga agaaggcata acaattctga 6000 

aaacttatta gacagaggaa aatatraaaga agtaaaaatg cagaattttt attaatatgg 6060 

gagacagtgt ggcataagta catataract gcatgagaat ggtttcttag tatgaggtta 6120 

aagataagtc tacaataatt tttaaagtgt gattctactt tgatgtaaat ctaatttttt 6180 

gttttaccaa ttaaaacttc acttgtacac ttgctcttag ccaagaggct gagaagccgt 6240 

aagacttcac ttttacagta gtgatttgta atttaaggaa aatacttggt ttcttaacta 6300 

gaataatttt ttccaatttg aagttttctt gtggatcctt gagaatgttt ttcttttaaa 6360 

agaggtctgt tctttgtgat gggaagaatg aaaaaaaaaa gaggtatgaa ccttattcaa 6420 

gttraagaaa cgtatgaaaa gaaagaaatc caaagttcct gtctcacctg ggttaataag 6480 

taacagtgtg accttgggca agttgcttag ccctttaaac ataattttca tctttgtaaa 6540 

atgagaagat tgatatatga ttgtgtttat tctagctctg acattctgtg atgctctgat 6600 

gatatgtctc catgcaagaa atgrcaggat aatataaaat ttagaagttc ttttccattt 6660 

atatttaaca cttctatatc cttccttcca ggtagcgatg ggcttctttc aagtgggttt 6720 

tgtttctgtc tacctctcag atgccttgct gagtggattt gtcactggtg cctccttcac :6780 

tattcttaca tctcaggcca agtatcttct tgggctcaac cttcctcgga ctaatggtgt 6840 

gggctcactc atcactacct ggatacatgt cttcagaaac atccataaga ccaatctctg 6900 

tgatcttatc accagccttt tgtgcctttt ggttcttttg ccaaccaaag aactcaatga 6960 

acacttcaaa tccaagctta aggcaccgat tcctattgaa cttgttgttg ttgtagcagc 7020 

cacattagcc tctcattwtg gaaaactaca tgaaaattat aattctagta ttgctggaca 7080 

tattcccact gggtttatgc cacccaaagt accagaatgg aacctaattc ctagtgtggc 7140 

tgtagatgca atagctattt ccatcattgg ttttgctatc actgtatcac tttctgagat 7200 

gtttgccaag aaacatggtt acacagtcaa agcaaaccag gaaatgtatg ccattggctt 7260 

ttgtaatatc atcccttcct tcttccactg ttttactact agtgcagctc ttgcaaagac 7320 

attggttaaa gaatcaacag gctgccatac tcagctttct ggtgtggtaa cagccctggt 7380 

tcttttgttg gtcctcctag taatagctcc tttgttctat tcccttcaaa aaagtgtcct 7440 

tggtgtgatc acaattgtaa atctacgggg agcccttcgt aaatttaggg atcttcccaa 7500 

aatgtggagt attagtagaa tggatacagt tatctggttt gttactatgc tgtcctctgc 7560 

actgctaagt actgaaatag gcctacttgt tggggtttgt ttttctatat tttgtgtcat 7620 

cctccgcact cagaagccaa agagttcact gcttggcttg gtggaagagt ctgaggtctt 7680 

tgaatctgtg tctgcttaca agaaccttca gaytaagcca ggcatcaaga ttttccgctt 77 40 

tgtagcccct ctctactaca taaacaaaga. atgctttaaa tctgctttat acaaacaaac 7800 

tgtcaaccca atcttaataa aggtggcttg gaagaaggca gcaaagagaa agatcaaaga 7860 
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aaaagtagtg actcttggtg gaatccagga tgaaatgtca gtgcaacttt cccatgatcc 7920 
cttggagctg catactatag tgattgactg cagtgcaatt . caatttttag, atacagcagg=-7-980, . - 
gatccacaca ctgaaagaag ttcgcagaga ttatgaagcc attggaatcc aggttctgct 8040 

ggctcagtgc aatcccwctg tgagggattc cctaaccaac ggagaatatt gcaaaaagga 8100 
agaagaaaac cttctcttct atagtgtgta tgaagcgatg gcttttgcag aagtatctaa 8160 
aaatcagaaa ggagtatgtg ttcccaatgg tctgagtctt agtagtgatt aattgagaag 8220 
gtagatagaa gaatgtctag ccaataggtt aaaatttcaa gtgtccaaca tttcccagtt 8280 
ccacagtggg aaattttgca cacttgaaat tttaaccaag tggctagata ttattcctcc 8340 
tttgaagcta atggcatttg tatatacaca ctgcagcaga gcttgtagct ggacagagtc 8400 
aaaaagaaga aaatacggtt tcaggctttc ttgcagatat gaagtattct tggaatgcaa 8460 
taagtatgta ttgaactgta ctgtaaagta gctccaaaac ttaattactc tcctgtttta 8520 
ggggttatac atttggactg; tgcattctcc aagagatgaa gcggtgaagt tgggatttac 8580 
attggaagtg ctgtagactt ctttatgtgg ctcagtggag agagggaaag aatgttgcac 8640 
ctgctctagt accataggtc aagaggcttc tggatcacaa agtcataact- agacaggttt-- 8700- - 
gttcttgtag ttttctatcc ccagtctttg ctccccagat ggcagtagtt tttagtagga 87 60 
aagtgccatt cctgtcctta agcfcacagtc tcatcagaag tctaatacct gggcaggttt 8820 
ataacatcct gagagccagc ctgacattag acagaatacc ctttgtaata cattggaaat 8880 
ttttactcat gcctttttgt ttaggataaa taggtaagca caaagagctc ttcaaaatca 8940 
gaaaaaacaa taggagtcct tccttgtctt ttctgtgatc tctgtccttg tttctgagac 9000 
tttctctacc attaagctct attttagctt tcagttattc tagtttgttt cccatggaat 9060 
ctgtcctaaa ctggtgtttt tgtcagtgac agtcttgcca gtcagcaatt tctaacagca 9120 
ttttaaatga gtttgatgta cagtaaatat tgatgacaat gacagctttt aactcttcaa 9180 
gtcacctaaa gctattatgc aggaggattt agaagtcaca ttcataaaac ccaagggcta 9240 
tgggtgtatt attcatgata gctggcccac aggtcatgaa ttgaggagga atttgctttc 9300 
aaaaagcaag aatgtccaac actgaaagtt tatagtttta tatttggacc ttgaaaggta 9360 
agaaaaaacc aggttctcca aagttaggaa . tagggaacta atttatgaaa cagccatctt 9420 
aaaaaaaaaa aaagtaaact gcaaaagtac aaaatcattt ttcaatctgt tcccagtttc 9480 
taaacaattt taaatattta tgagaagcaa accctatgtg tagggcatct gttggagtgg 9540 
gatgctttta gacatatatt aagtatgtac atgtttaata tgtatattta aaatgcatat 9600 
atattttatt atatctatat tatcctatat agatatatgt aacttagctt tattgttagc 9660 
tccataagct gccagtgttg cttttctgtt ggtagagctc tcccatttgg tgacatggaa 9720 
aatacctttc cattatcaca acaaagcagt tgctcagtag aaagtctaga tttctgtctt 9780 
ataggtgatt tctgtcttat aggtgattat aatcaagtgt aggcttcctg aattttgaca 9840 
tccttttaga acttgggtct ggaattccag aaatgttaat tgctgcttgt atttgttctt 9900 
gtttgttttt tagccagtat ttgccctttc tatccagcct tatgaataat agcagtaaaa 9960 
tcacagtatc ttggtcagtc tttatttttt tccttttttc ttttttaaga gacagtcatc 10020 
caggccagag tgcagtttga tgatagctta ctgaagcttc ccactcctgg gctcaagtta 10080 
tccttccatt ttggcctcct gagtagctag accataggta tgcatcacca caccctgcta 10140 
attttttaaa tttttttcta gagagagggt ctcactgtgt tgcccaggct ggtctcaaac 10200 
tccaggctca agcaatcctt cagcctcagc ctcccagagt gttgggatta caggcgtgag 10260 
ccactgcact tggccaagtt atttattttt aatctctctt gcccttctcc caaggcaggc 10320 
ttaagttgag actattatag gtgtctaata acctgtgaca gagtaatgag tacatgctta 10380 
agatgttata attagccaac accaacacag caaaaaatat aattccagcc aaagattctg 10440 
gaaaatccct cagaaggagg gataacagga tttgaccttt accagcgatt tctgtccata 10500 
tgtggatgta aacagttctg gaacgttatg catgcagtta gcgaatcctt gaattatgtt 10560- 
ctggtttgta cttgtcccat ccatccaaac aagagattct gcttttggta gccatctgta 10620 
gaaacattta agatgtcact agaatttaca tttcatcctc tctacttggg ttgaggttgc 10680 
ctatacttgc atattgttaa aatgttttgg ttgctgatat tcagaggaat gaaacctgga. 10740 
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accaaagcct aatttgccga taaaaaaact gttttcggcc aggtgcagtg gctcatgcct. 10800 
gtaatcccag cacgttggga ggccgaggcg ggtggatcac ctgaagtcag gagttcgaga 10860- 
ccatcctggc taacactgtg aaaccccgtc tctactaaaa atacaaaaaa ttagcggggc 10920 

atggtggcac gcgcctgtag tcccagctac tcaggaggct gaggcaggag aattacttga 10980 
acccgggagg cggaggttgc agtaagcaga gattgtgcca ctggactcca gcctgggtga 11040 
cagagcgaga ctccgtctca aacaaacaaa caaaaaactg ttttcatttg ctctcttgac 11100 
caaaggatag gactttagtt ctttaagcat tattttaaac actatattga tacaaaaata 11160 
tcttgcttac tctaaacttt agagtctaaa tgaagctttt tctcagtaca agattctgag 11220 
tatcataaaa tggttattta attgaaacgt agtgtggtat actcttgatg gttagaactc 11280 
ttacagcctt atttattttt aagtttgtta cagccaaagg gttggagtgt gccagtgcac 11340 
aggtagacta aggaaaacat tatagaggag tgaagagaac agaccattga aaagactatt 11400 
atctgaccag cggaggcaga aaagagagga acccagttga ataggatcca. atccctggtt 11460 
agcctctaca caataatagg gagacaagga ttaggagcca tacctcccag agcaaggtat 11520 
ctttctagag caaatttctc tttctagaag • gggagggtca- cagggtcaca gattcaccaa 11580'-^ 
agctgaaagg gctgaggagc tcatggtagc ctgggttgac ctactctgga gcacggtgtc 11640 
ttccttctaa actgagtgac tgtagtacta tctgtgcctc tgatggtaat aaaactgaca 11700 
agatgtctaa ttttttttta agtaggacca aaggaaaaca agatttagat agtctgactt 11760 
tgcttttgaa caacagacat tgcaagtcaa aattgttgtc aaatttacat atggtaaatg 11820 
atgaacttta aaaatgtgtc caggtgttag atgagttcat tagactcttt taatgctaat 11880 
ggctagtacg tttaaacaaa acagcagttc tctgctgcaa tattcccatt gaccacttaa 11940 
atgaccataa gtggtcattt aagaacatgt tagggttagc cctgatctga atataaaagt 12000 
gagaaaaggg ctacagtgca tttcttggta acttaaactg agtcttgaag ttataatgat 12060 
ccattcgagt tctgtgatcc ttattgttct taattgtgtt tctctacgta ttgttacaga 12120 
tgagccatac gtttctttgt atcaatgtag acatgacttc agatacctct gaggacctac 12180 
ccagcagtct aggaccctgg gccaagtgct gg 12212 

<210> 25 

<211> 540 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> (30) 

<223> PSl: Polymorphic base G or A 
<220> 

<221> misc_feature 
<222> (61) . . (120) 

<223> nucleotides represent secjuence between PSl* and PS2 
<220> 

<221> allele 
<222> (150) 

<223> PS2: Polymorphic base A or G 
<220> 
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<221> misc^f eature . 

<222> (181) (240) - ' . 

<223> nucleotides represent sequence between PS2 and PS3 

<220> 

<221> allele 
<222> (270) 

<223> PS3: Polymorphic base T or A 
<220> 

<221> mis cofeature 

<222> (301) . . (360) 

<223> nucleotides represent sequence between PS3 and PS4 



<220> 

<221> allele ' , . 

<222> (390) . - ^ ^ - 

<223> PS4: Polymorphic base C or T 

<220> 

<221> misc_feature 
<222> (421) . . (480) 

<223> nucleotides represent sequence between PS4 and PS5 
<220> 

<221> allele 
<222> (510) 

<223> PS5: Polymorphic base A or T 
<400> 25 

cttaatggtt tatactcttg ggaagtcctr tacccagcca gttatttgct ttgacttggc 60 
nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 120 
aaccaagaca aatgaggctc aattaaggar aagggacata agatacctat tccaaaactg 180 
nnnnnnnnnn nnnnnnnnnn ^ nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 240 
tgttgtagca gccacattag cctctcattw tggaaaacta catgaaaatt ataattctag 300 
nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 360 
atctgtgtct gcttacaaga accttcagay taagccaggc atcaagattt tccgctttgt 420 
nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 480 
tccaggttct gctggctcag tgcaatcccw ctgtgaggga ttccctaacc aacggagaat 540 
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